AI 工具3 credits

extract_structured

为提取器提供 JSON Schema 和自然语言提示词。LLM 会读取页面并返回符合你 schema 的数据。当未配置 LLM 提供方时，它会使用你的提示回退到 CSS 选择器提取。

使用场景

schema 优先的商品提取

只需定义一次你想要的字段；LLM 会将任意电商站点映射到你的 schema。

简历与文档解析

将候选人姓名、技能和工作经历直接提取到一个类型化对象中。

知识图谱填充

从文章中提取实体和关系，生成供图加载器使用的结构化 JSON。

Endpoint

POST/api/v1/tools/extract_structured

Auth Required

Free 计划 2 req/s

3 credits

Parameters

LLM 与选择器回退： 提供 llmConfig 以使用 LLM 驱动的提取。不提供时，该工具使用 selectorHints 进行确定性的 CSS 提取 —— 更便宜，且无需 LLM key。

Name	Type	Required	Default	Description
url	string	Required	-	要从中提取数据的 URL Example: https://example.com/product/123
schema	object	Required	-	描述要提取数据的 JSON Schema Example: {"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}},"required":["title"]}
prompt	string	Optional	-	指导 LLM 提取的自然语言指令 Example: Extract the product name, current price, and whether it is in stock
llmConfig	object	Optional	-	可选的 LLM 提供方配置（provider、apiKey）。省略则使用 CSS 选择器回退。 Example: {"provider": "openai", "apiKey": "sk-..."}
selectorHints	object	Optional	-	用于指导提取的 CSS 选择器提示（也供选择器回退使用） Example: {"title": "h1.product-title", "price": ".price"}
fallbackToSelectors	boolean	Optional	true	当 LLM 不可用时回退到 CSS 选择器提取 Example: true

请求示例

cURL — LLM 提取

terminalBash

curl -X POST https://crawlforge.dev/api/v1/tools/extract_structured \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/123",
    "schema": {
      "type": "object",
      "properties": {
        "title": { "type": "string" },
        "price": { "type": "number" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["title", "price"]
    },
    "prompt": "Extract the product name, price in USD, and availability",
    "llmConfig": { "provider": "openai", "apiKey": "sk-..." }
  }'

TypeScript — 选择器回退

extractStructured.tsTypescript

const response = await fetch('https://crawlforge.dev/api/v1/tools/extract_structured', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/product/123',
    schema: {
      type: 'object',
      properties: {
        title: { type: 'string' },
        price: { type: 'number' },
      },
      required: ['title'],
    },
    selectorHints: {
      title: 'h1.product-title',
      price: '.price-value',
    },
    fallbackToSelectors: true,
  }),
});

const data = await response.json();
if (data.success) {
  console.log(data.data.extracted.title, data.data.extracted.price);
}

Python

extract_structured.pyPython

import requests, os

response = requests.post(
    'https://crawlforge.dev/api/v1/tools/extract_structured',
    headers={
        'X-API-Key': os.environ['CRAWLFORGE_API_KEY'],
        'Content-Type': 'application/json',
    },
    json={
        'url': 'https://example.com/article/42',
        'schema': {
            'type': 'object',
            'properties': {
                'headline': {'type': 'string'},
                'author': {'type': 'string'},
                'published_at': {'type': 'string'},
                'tags': {'type': 'array'},
            },
            'required': ['headline'],
        },
        'prompt': 'Extract headline, author, publish date (ISO 8601), and tags',
    },
)

data = response.json()
if data['success']:
    print(data['data']['extracted'])

响应示例

200 OK1.2s

{
  "success": true,
  "data": {
    "url": "https://example.com/product/123",
    "extracted": {
      "title": "Premium Wireless Headphones",
      "price": 299.99,
      "in_stock": true
    },
    "extraction_method": "llm",
    "schema_fields": 3,
    "required_fields": 2,
    "llm_provider": "openai",
    "confidence": 0.92
  },
  "credits_used": 3,
  "credits_remaining": 997,
  "processing_time": 1240
}

Field Descriptions

data.extracted与你提供的 JSON Schema 匹配

data.extraction_method配置了提供方时为 "llm"，否则为 "selector_fallback"

data.confidence提取器置信度（LLM 置信度或选择器匹配率）

credits_used每次调用固定 3 credits

credit 费用