extract_structured

Give the extractor a JSON Schema and a natural-language prompt. The LLM reads the page and returns data matching your schema. When no LLM provider is configured it falls back to CSS selector extraction using your hints.

Use Cases

Schema-First Product Extraction

Define the fields you want once; the LLM maps any e-commerce site to your schema.

Resume & Document Parsing

Extract candidate names, skills, and work history directly into a typed object.

Knowledge Graph Seeding

Extract entities and relationships from articles into structured JSON for graph loaders.

Endpoint

POST/api/v1/tools/extract_structured

Auth Required

2 req/s on Free plan

3 credits

Parameters

LLM vs. selector fallback: Provide llmConfig to use LLM-powered extraction. Without it, the tool uses selectorHints for deterministic CSS extraction — cheaper and no LLM key required.

Name	Type	Required	Default	Description
url	string	Required	-	URL to extract data from Example: https://example.com/product/123
schema	object	Required	-	JSON Schema describing the data to extract Example: {"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}},"required":["title"]}
prompt	string	Optional	-	Natural-language instructions guiding the LLM extraction Example: Extract the product name, current price, and whether it is in stock
llmConfig	object	Optional	-	Optional LLM provider configuration (provider, apiKey). Omit to use CSS selector fallback. Example: {"provider": "openai", "apiKey": "sk-..."}
selectorHints	object	Optional	-	CSS selector hints to guide extraction (also used by selector fallback) Example: {"title": "h1.product-title", "price": ".price"}
fallbackToSelectors	boolean	Optional	true	Fall back to CSS selector extraction when LLM is unavailable Example: true

Request Examples

cURL — LLM extraction

terminalBash

curl -X POST https://crawlforge.dev/api/v1/tools/extract_structured \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/123",
    "schema": {
      "type": "object",
      "properties": {
        "title": { "type": "string" },
        "price": { "type": "number" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["title", "price"]
    },
    "prompt": "Extract the product name, price in USD, and availability",
    "llmConfig": { "provider": "openai", "apiKey": "sk-..." }
  }'

TypeScript — selector fallback

extractStructured.tsTypescript

const response = await fetch('https://crawlforge.dev/api/v1/tools/extract_structured', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/product/123',
    schema: {
      type: 'object',
      properties: {
        title: { type: 'string' },
        price: { type: 'number' },
      },
      required: ['title'],
    },
    selectorHints: {
      title: 'h1.product-title',
      price: '.price-value',
    },
    fallbackToSelectors: true,
  }),
});

const data = await response.json();
if (data.success) {
  console.log(data.data.extracted.title, data.data.extracted.price);
}

Python

extract_structured.pyPython

import requests, os

response = requests.post(
    'https://crawlforge.dev/api/v1/tools/extract_structured',
    headers={
        'X-API-Key': os.environ['CRAWLFORGE_API_KEY'],
        'Content-Type': 'application/json',
    },
    json={
        'url': 'https://example.com/article/42',
        'schema': {
            'type': 'object',
            'properties': {
                'headline': {'type': 'string'},
                'author': {'type': 'string'},
                'published_at': {'type': 'string'},
                'tags': {'type': 'array'},
            },
            'required': ['headline'],
        },
        'prompt': 'Extract headline, author, publish date (ISO 8601), and tags',
    },
)

data = response.json()
if data['success']:
    print(data['data']['extracted'])

Response Example

200 OK1.2s

{
  "success": true,
  "data": {
    "url": "https://example.com/product/123",
    "extracted": {
      "title": "Premium Wireless Headphones",
      "price": 299.99,
      "in_stock": true
    },
    "extraction_method": "llm",
    "schema_fields": 3,
    "required_fields": 2,
    "llm_provider": "openai",
    "confidence": 0.92
  },
  "credits_used": 3,
  "credits_remaining": 997,
  "processing_time": 1240
}

Field Descriptions

data.extractedMatches the JSON Schema you provided

data.extraction_method"llm" when provider configured, "selector_fallback" otherwise

data.confidenceExtractor confidence (LLM confidence or selector match rate)

credits_usedFlat 3 credits per call

Credit Cost