AI Tool3 credits
extract_structured
Give the extractor a JSON Schema and a natural-language prompt. The LLM reads the page and returns data matching your schema. When no LLM provider is configured it falls back to CSS selector extraction using your hints.
Use Cases
Schema-First Product Extraction
Define the fields you want once; the LLM maps any e-commerce site to your schema.
Resume & Document Parsing
Extract candidate names, skills, and work history directly into a typed object.
Knowledge Graph Seeding
Extract entities and relationships from articles into structured JSON for graph loaders.
Endpoint
POST
/api/v1/tools/extract_structuredAuth Required
2 req/s on Free plan
3 credits
Parameters
LLM vs. selector fallback: Provide
llmConfig to use LLM-powered extraction. Without it, the tool uses selectorHints for deterministic CSS extraction — cheaper and no LLM key required.| Name | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Required | - | URL to extract data from Example: https://example.com/product/123 |
schema | object | Required | - | JSON Schema describing the data to extract Example: {"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}},"required":["title"]} |
prompt | string | Optional | - | Natural-language instructions guiding the LLM extraction Example: Extract the product name, current price, and whether it is in stock |
llmConfig | object | Optional | - | Optional LLM provider configuration (provider, apiKey). Omit to use CSS selector fallback. Example: {"provider": "openai", "apiKey": "sk-..."} |
selectorHints | object | Optional | - | CSS selector hints to guide extraction (also used by selector fallback) Example: {"title": "h1.product-title", "price": ".price"} |
fallbackToSelectors | boolean | Optional | true | Fall back to CSS selector extraction when LLM is unavailable Example: true |
Request Examples
cURL — LLM extraction
terminalBash
curl -X POST https://crawlforge.dev/api/v1/tools/extract_structured \
-H "X-API-Key: cf_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/123",
"schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"in_stock": { "type": "boolean" }
},
"required": ["title", "price"]
},
"prompt": "Extract the product name, price in USD, and availability",
"llmConfig": { "provider": "openai", "apiKey": "sk-..." }
}'TypeScript — selector fallback
extractStructured.tsTypescript
const response = await fetch('https://crawlforge.dev/api/v1/tools/extract_structured', {
method: 'POST',
headers: {
'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://example.com/product/123',
schema: {
type: 'object',
properties: {
title: { type: 'string' },
price: { type: 'number' },
},
required: ['title'],
},
selectorHints: {
title: 'h1.product-title',
price: '.price-value',
},
fallbackToSelectors: true,
}),
});
const data = await response.json();
if (data.success) {
console.log(data.data.extracted.title, data.data.extracted.price);
}Python
extract_structured.pyPython
import requests, os
response = requests.post(
'https://crawlforge.dev/api/v1/tools/extract_structured',
headers={
'X-API-Key': os.environ['CRAWLFORGE_API_KEY'],
'Content-Type': 'application/json',
},
json={
'url': 'https://example.com/article/42',
'schema': {
'type': 'object',
'properties': {
'headline': {'type': 'string'},
'author': {'type': 'string'},
'published_at': {'type': 'string'},
'tags': {'type': 'array'},
},
'required': ['headline'],
},
'prompt': 'Extract headline, author, publish date (ISO 8601), and tags',
},
)
data = response.json()
if data['success']:
print(data['data']['extracted'])Response Example
200 OK1.2s
{ "success": true, "data": { "url": "https://example.com/product/123", "extracted": { "title": "Premium Wireless Headphones", "price": 299.99, "in_stock": true }, "extraction_method": "llm", "schema_fields": 3, "required_fields": 2, "llm_provider": "openai", "confidence": 0.92 }, "credits_used": 3, "credits_remaining": 997, "processing_time": 1240}Field Descriptions
data.extractedMatches the JSON Schema you provideddata.extraction_method"llm" when provider configured, "selector_fallback" otherwisedata.confidenceExtractor confidence (LLM confidence or selector match rate)credits_usedFlat 3 credits per callCredit Cost
3 credits
3 credits per request
Flat 3 credits whether the call uses the LLM or the selector fallback.
Tip: Pair with scrape_structured (2 credits, CSS-only) when you already have stable selectors and don't need LLM flexibility.
Related Tools
Ready to extract typed structured data? Sign up for free and get 1,000 credits.