extract_with_llm

AI 驱动的提取，默认运行在本地 Ollama 模型上 —— 无需 API key，且数据不会离开你的机器。需要托管模型时，可选择路由到 OpenAI 或 Anthropic。给它一个提示词（可选附带 JSON Schema），即可返回结构化数据。

使用场景

本地优先的提取

在自己机器上的 Ollama 上运行提取 —— 零 LLM API 费用，且默认私密。

schema 驱动的数据湖

将提示词与 JSON Schema 结合，为你的数据仓库或图存储填充类型化的行。

多提供方故障转移

从本地 Ollama 开始，仅切换一个参数即可在高风险页面上回退到 OpenAI 或 Anthropic。

Endpoint

POST/api/v1/tools/extract_with_llm

Auth Required

Free 计划 2 req/s

3 credits

Parameters

Ollama 是默认值： 不设置 provider（或使用 "auto"），该工具便会在你本地的 Ollama 安装上运行 —— 无需 LLM API key。将 provider 设为 "openai" 或 "anthropic" 即可改用托管模型。

Name	Type	Required	Default	Description
url	string	Optional	-	要抓取并从中提取的 URL。url 与 content 二选一必填。 Example: https://example.com/article/42
content	string	Optional	-	要从中提取的原始文本或 HTML 内容。url 与 content 二选一必填。 Example: "<html>...</html>"
prompt	string	Required	-	指导 LLM 提取的自然语言指令 Example: Extract the headline, author, and three key takeaways
schema	object	Optional	-	可选的 JSON Schema，用于描述要提取的数据结构 Example: {"type":"object","properties":{"title":{"type":"string"}},"required":["title"]}
provider	string	Optional	auto	LLM 提供方："ollama"（本地，默认）、"openai"、"anthropic" 或 "auto" Example: ollama
model	string	Optional	-	模型标识符。各提供方默认值：llama3.2、gpt-4o-mini、claude-haiku-4-5-20251001 Example: llama3.2
maxTokens	number	Optional	4096	LLM 响应的最大 token 数（1–32000） Example: 4096

请求示例

cURL — 本地 Ollama（默认，无需 API key）

terminalBash

curl -X POST https://crawlforge.dev/api/v1/tools/extract_with_llm \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article/42",
    "prompt": "Extract the headline, author, and three key takeaways",
    "provider": "ollama"
  }'

TypeScript — 带 schema 的 OpenAI

extractWithLlm.tsTypescript

const response = await fetch('https://crawlforge.dev/api/v1/tools/extract_with_llm', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/product/123',
    prompt: 'Extract product name, price in USD, and stock status',
    provider: 'openai',
    model: 'gpt-4o-mini',
    schema: {
      type: 'object',
      properties: {
        title: { type: 'string' },
        price: { type: 'number' },
        in_stock: { type: 'boolean' },
      },
      required: ['title', 'price'],
    },
  }),
});

const data = await response.json();
if (data.success) {
  console.log(data.data.extracted);
  console.log('Tokens used:', data.data.tokens_used);
}

Python — Anthropic

extract_with_llm.pyPython

import requests, os

response = requests.post(
    'https://crawlforge.dev/api/v1/tools/extract_with_llm',
    headers={
        'X-API-Key': os.environ['CRAWLFORGE_API_KEY'],
        'Content-Type': 'application/json',
    },
    json={
        'url': 'https://example.com/article/42',
        'prompt': 'Extract headline, author, publish date (ISO 8601), and tags',
        'provider': 'anthropic',
        'model': 'claude-haiku-4-5-20251001',
        'schema': {
            'type': 'object',
            'properties': {
                'headline': {'type': 'string'},
                'author': {'type': 'string'},
                'published_at': {'type': 'string'},
                'tags': {'type': 'array'},
            },
            'required': ['headline'],
        },
    },
)

data = response.json()
if data['success']:
    print(data['data']['extracted'])

响应示例

200 OK1.4s

{
  "success": true,
  "data": {
    "provider_used": "ollama",
    "model_used": "llama3.2",
    "tokens_used": 842,
    "extracted": {
      "headline": "How Local LLMs Are Changing Data Pipelines",
      "author": "Jane Doe",
      "takeaways": [
        "Lower cost",
        "Better privacy",
        "Faster iteration"
      ]
    },
    "prompt_used": "Extract the headline, author, and three key takeaways"
  },
  "credits_used": 3,
  "credits_remaining": 997,
  "processing_time": 1420
}

Field Descriptions

data.provider_used解析出的提供方 —— 当 provider 为 "auto" 时为 "ollama"

data.model_used除非你指定，否则使用各提供方的默认模型

data.tokens_used本次提取消耗的 token 数

credits_used无论使用哪个提供方，固定 3 credits

credit 费用