高级工具5 credits

batch_scrape

通过异步作业管理、webhook 通知和可配置的并发量并行抓取多个 URL。非常适合批量数据采集和自动化工作流。

使用场景

批量数据采集

同时跨多个页面抓取产品目录、新闻文章或研究论文

竞品分析

在一个批次中监控竞争对手网站的价格、功能和内容

自动化工作流

与 webhook 集成，在抓取作业完成时进行实时处理

定时报告

通过批量抓取仪表板、分析或状态页面生成每日报告

内容归档

将多个页面归档为截图或 PDF，以满足合规或历史记录需求

并行处理

控制并发级别，在遵守速率限制的同时优化速度

Endpoint

POST/api/v1/tools/batch_scrape

Auth Required

Free 计划 2 req/s

5 credits

Parameters

Name	Type	Required	Default	Description
urls	string[]	Required	-	要抓取的 URL 数组（1-50 个 URL） Example: ["https://example.com", "https://example.org"]
formats	string[]	Optional	["markdown"]	每个 URL 的输出格式：markdown、html、text、screenshot 或 pdf Example: ["markdown", "screenshot"]
webhook	string	Optional	-	用于接收作业完成通知的 webhook URL Example: https://yourapp.com/webhook/scrape-complete
maxConcurrency	number	Optional	5	最大并发请求数（1-10） Example: 10
timeout	number	Optional	30000	每个 URL 的超时时间（毫秒） Example: 45000
onlyMainContent	boolean	Optional	false	仅提取主要内容，去除样板内容 Example: true

Webhook 负载

当批次完成时，您的 webhook URL 将收到：

webhook-payload.jsonJson

{
  "jobId": "batch_1234567890abcdef",
  "status": "completed",
  "totalUrls": 3,
  "successful": 3,
  "failed": 0,
  "completedAt": "2025-10-01T12:01:45Z",
  "results": [
    {
      "url": "https://example.com/page1",
      "status": "success",
      "formats": {
        "markdown": "https://cdn.crawlforge.dev/results/...",
        "screenshot": "https://cdn.crawlforge.dev/results/..."
      }
    },
    {
      "url": "https://example.com/page2",
      "status": "success",
      "formats": {
        "markdown": "https://cdn.crawlforge.dev/results/...",
        "screenshot": "https://cdn.crawlforge.dev/results/..."
      }
    },
    {
      "url": "https://example.com/page3",
      "status": "success",
      "formats": {
        "markdown": "https://cdn.crawlforge.dev/results/...",
        "screenshot": "https://cdn.crawlforge.dev/results/..."
      }
    }
  ]
}

请求示例

terminalBash

curl -X POST https://crawlforge.dev/api/v1/tools/batch_scrape \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "formats": ["markdown", "screenshot"],
    "webhook": "https://yourapp.com/webhook",
    "maxConcurrency": 5,
    "onlyMainContent": true
  }'

响应示例

200 OK156ms

{
  "success": true,
  "data": {
    "jobId": "batch_1234567890abcdef",
    "status": "processing",
    "totalUrls": 3,
    "completed": 0,
    "successful": 0,
    "failed": 0,
    "startedAt": "2025-10-01T12:00:00Z",
    "estimatedCompletionAt": "2025-10-01T12:02:00Z",
    "results": []
  },
  "credits_used": 5,
  "credits_remaining": 995,
  "processing_time": 156
}

Field Descriptions

data.jobId用于跟踪此批量作业的唯一标识符

data.status作业状态：queued、processing、completed 或 failed

data.totalUrls批次中的 URL 总数

data.completed已处理的 URL 数（成功 + 失败）

data.estimatedCompletionAt基于并发量估算的完成时间

credits_used每次批量请求 5 credits（固定费用）

credits_remaining您剩余的 credits 余额

错误处理

URL 过多（400 Bad Request）

每个批次最多 50 个 URL。请将大型批次拆分为多个请求。

无效的 webhook URL（400 Bad Request）

webhook 必须为有效的 HTTPS URL。出于安全考虑，不支持 HTTP webhook。

credits 不足（402 Payment Required）

每次调用需要预先预留 5 credits。重试前请先添加更多 credits。

未找到作业（404 Not Found）

该作业 ID 不存在或已过期。作业在完成后保留 7 天。

专业提示： 对于大型批次，请使用 webhook 而非轮询。这能减少 API 调用并提升可靠性。失败的 URL 不消耗 credits——仅成功的抓取会计费。

credits 费用

5 credits

每次请求 5 credits

每次批量请求为固定费用。每个批次最多处理 50 个 URL，支持并行执行和 webhook 通知。

包含内容：

每个批次最多 50 个 URL

支持可配置并发量的并行处理

多种输出格式（markdown、HTML、text、screenshot、PDF）

完成时发送 webhook 通知

异步作业管理

计划推荐：

Free 计划： 1,000 个一次性试用 credits = 200 次批量请求

Hobby 计划： 5,000 credits = 1,000 次批量请求（$19/mo）

Professional 计划： 50,000 credits = 10,000 次批量请求（$99/mo）

batch_scrape

通过异步作业管理、webhook 通知和可配置的并发量并行抓取多个 URL。非常适合批量数据采集和自动化工作流。

使用场景

批量数据采集

同时跨多个页面抓取产品目录、新闻文章或研究论文

竞品分析

在一个批次中监控竞争对手网站的价格、功能和内容

自动化工作流

与 webhook 集成，在抓取作业完成时进行实时处理

定时报告

通过批量抓取仪表板、分析或状态页面生成每日报告

内容归档

将多个页面归档为截图或 PDF，以满足合规或历史记录需求

并行处理

控制并发级别，在遵守速率限制的同时优化速度

Endpoint

POST/api/v1/tools/batch_scrape

Auth Required

Free 计划 2 req/s

5 credits

Parameters

Name	Type	Required	Default	Description
urls	string[]	Required	-	要抓取的 URL 数组（1-50 个 URL） Example: ["https://example.com", "https://example.org"]
formats	string[]	Optional	["markdown"]	每个 URL 的输出格式：markdown、html、text、screenshot 或 pdf Example: ["markdown", "screenshot"]
webhook	string	Optional	-	用于接收作业完成通知的 webhook URL Example: https://yourapp.com/webhook/scrape-complete
maxConcurrency	number	Optional	5	最大并发请求数（1-10） Example: 10
timeout	number	Optional	30000	每个 URL 的超时时间（毫秒） Example: 45000
onlyMainContent	boolean	Optional	false	仅提取主要内容，去除样板内容 Example: true

Webhook 负载

当批次完成时，您的 webhook URL 将收到：

webhook-payload.jsonJson

{
  "jobId": "batch_1234567890abcdef",
  "status": "completed",
  "totalUrls": 3,
  "successful": 3,
  "failed": 0,
  "completedAt": "2025-10-01T12:01:45Z",
  "results": [
    {
      "url": "https://example.com/page1",
      "status": "success",
      "formats": {
        "markdown": "https://cdn.crawlforge.dev/results/...",
        "screenshot": "https://cdn.crawlforge.dev/results/..."
      }
    },
    {
      "url": "https://example.com/page2",
      "status": "success",
      "formats": {
        "markdown": "https://cdn.crawlforge.dev/results/...",
        "screenshot": "https://cdn.crawlforge.dev/results/..."
      }
    },
    {
      "url": "https://example.com/page3",
      "status": "success",
      "formats": {
        "markdown": "https://cdn.crawlforge.dev/results/...",
        "screenshot": "https://cdn.crawlforge.dev/results/..."
      }
    }
  ]
}

请求示例

terminalBash

curl -X POST https://crawlforge.dev/api/v1/tools/batch_scrape \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "formats": ["markdown", "screenshot"],
    "webhook": "https://yourapp.com/webhook",
    "maxConcurrency": 5,
    "onlyMainContent": true
  }'

响应示例

200 OK156ms

{
  "success": true,
  "data": {
    "jobId": "batch_1234567890abcdef",
    "status": "processing",
    "totalUrls": 3,
    "completed": 0,
    "successful": 0,
    "failed": 0,
    "startedAt": "2025-10-01T12:00:00Z",
    "estimatedCompletionAt": "2025-10-01T12:02:00Z",
    "results": []
  },
  "credits_used": 5,
  "credits_remaining": 995,
  "processing_time": 156
}

Field Descriptions

data.jobId用于跟踪此批量作业的唯一标识符

data.status作业状态：queued、processing、completed 或 failed

data.totalUrls批次中的 URL 总数

data.completed已处理的 URL 数（成功 + 失败）

data.estimatedCompletionAt基于并发量估算的完成时间

credits_used每次批量请求 5 credits（固定费用）

credits_remaining您剩余的 credits 余额

错误处理

URL 过多（400 Bad Request）

每个批次最多 50 个 URL。请将大型批次拆分为多个请求。

无效的 webhook URL（400 Bad Request）

webhook 必须为有效的 HTTPS URL。出于安全考虑，不支持 HTTP webhook。

credits 不足（402 Payment Required）

每次调用需要预先预留 5 credits。重试前请先添加更多 credits。

未找到作业（404 Not Found）

该作业 ID 不存在或已过期。作业在完成后保留 7 天。

专业提示： 对于大型批次，请使用 webhook 而非轮询。这能减少 API 调用并提升可靠性。失败的 URL 不消耗 credits——仅成功的抓取会计费。

credits 费用

5 credits

每次请求 5 credits

每次批量请求为固定费用。每个批次最多处理 50 个 URL，支持并行执行和 webhook 通知。

包含内容：

每个批次最多 50 个 URL

支持可配置并发量的并行处理

多种输出格式（markdown、HTML、text、screenshot、PDF）

完成时发送 webhook 通知

异步作业管理

计划推荐：

Free 计划： 1,000 个一次性试用 credits = 200 次批量请求

Hobby 计划： 5,000 credits = 1,000 次批量请求（$19/mo）

Professional 计划： 50,000 credits = 10,000 次批量请求（$99/mo）

batch_scrape

使用场景

Endpoint

Parameters

Webhook 负载

请求示例

响应示例

错误处理

credits 费用

相关工具

batch_scrape

使用场景

Endpoint

Parameters

Webhook 负载

请求示例

响应示例

错误处理

credits 费用

相关工具