OpenAI 的 Agents SDK 提供了一个可用于生产环境的框架，用于构建具备工具调用、任务移交（handoff）和护栏（guardrail）的自主 AI agent。CrawlForge 补上了缺失的一环：实时的网络访问能力。通过将 CrawlForge 的 20 个 scraping 工具连接到你的 OpenAI agent，你可以让它们搜索网络、提取结构化数据、阅读文档并开展多源研究——这一切都在 Agents SDK 的编排框架内完成。

本指南将向你展示如何把 CrawlForge 工具定义为 OpenAI agent 函数，并构建基于实时网页数据采取行动的 agent。

前置条件

Bash

pip install openai-agents
# or for the TypeScript/Node.js SDK:
npm install @openai/agents-sdk dotenv

Bash

# .env
OPENAI_API_KEY=sk-xxxxx
CRAWLFORGE_API_KEY=cf_live_xxxxx

在 crawlforge.dev/signup 获取你的 CrawlForge API key——包含 1,000 个免费 credits。

架构：CrawlForge + OpenAI agent

OpenAI Agents SDK 使用一种与函数调用 API类似但编排能力更丰富的工具模式。你将工具定义为带有 JSON Schema 参数的函数，由 agent 决定何时以及如何调用它们。

User Query -> OpenAI Agent -> Tool Selection -> CrawlForge API -> Results -> Agent Response

CrawlForge 位于 https://crawlforge.dev/api/v1/tools/ 的 REST API 可以干净地映射到 Agents SDK 的工具定义格式。每个工具都成为 agent 可以调用的一个函数。

步骤 1：创建 CrawlForge 工具函数

首先，创建一个可复用的 CrawlForge 客户端和工具定义：

Typescript

// lib/crawlforge-tools.ts
import { tool } from '@openai/agents-sdk';
import { z } from 'zod';

const CRAWLFORGE_BASE = 'https://crawlforge.dev/api/v1/tools';

async function callCrawlForge(toolName: string, params: Record<string, unknown>) {
  const response = await fetch(`${CRAWLFORGE_BASE}/${toolName}`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CRAWLFORGE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(params),
  });

  if (!response.ok) {
    throw new Error(`CrawlForge error: ${response.status} ${response.statusText}`);
  }

  return response.json();
}

// Search the web (5 credits)
export const searchWebTool = tool({
  name: 'search_web',
  description: 'Search Google and return top results with titles, URLs, and snippets.',
  parameters: z.object({
    query: z.string().describe('Search query'),
    limit: z.number().default(5).describe('Max results to return'),
  }),
  execute: async ({ query, limit }) => {
    return callCrawlForge('search_web', { query, limit });
  },
});

// Extract page content (2 credits)
export const extractContentTool = tool({
  name: 'extract_content',
  description: 'Extract the main readable content from a web page URL.',
  parameters: z.object({
    url: z.string().url().describe('Full URL to extract content from'),
  }),
  execute: async ({ url }) => {
    return callCrawlForge('extract_content', { url });
  },
});

// Structured scraping (2 credits)
export const scrapeStructuredTool = tool({
  name: 'scrape_structured',
  description: 'Extract structured data from a web page using CSS selectors.',
  parameters: z.object({
    url: z.string().url().describe('URL to scrape'),
    selectors: z.record(z.string()).describe('Map of field names to CSS selectors'),
  }),
  execute: async ({ url, selectors }) => {
    return callCrawlForge('scrape_structured', { url, selectors });
  },
});

// Fetch raw URL (1 credit)
export const fetchUrlTool = tool({
  name: 'fetch_url',
  description: 'Fetch raw HTML content from a URL. Cheapest option for simple retrieval.',
  parameters: z.object({
    url: z.string().url().describe('URL to fetch'),
  }),
  execute: async ({ url }) => {
    return callCrawlForge('fetch_url', { url });
  },
});

步骤 2：构建一个网络研究 agent

创建一个使用 CrawlForge 工具来研究主题的 agent：

Typescript

// agents/researcher.ts
import { Agent, run } from '@openai/agents-sdk';
import {
  searchWebTool,
  extractContentTool,
  scrapeStructuredTool,
} from '../lib/crawlforge-tools';

const researchAgent = new Agent({
  name: 'Web Researcher',
  model: 'gpt-4o',
  instructions: `You are a thorough web researcher. When asked about a topic:
1. Search the web for relevant, recent sources
2. Read the top 2-3 results to gather comprehensive information
3. Synthesize findings into a clear, cited summary
4. Always mention the URLs you sourced data from

Use search_web to find sources, then extract_content to read them.
Prefer extract_content over fetch_url when you need readable text.`,
  tools: [searchWebTool, extractContentTool, scrapeStructuredTool],
});

// Run the agent
async function research(topic: string) {
  const result = await run(researchAgent, {
    messages: [{ role: 'user', content: topic }],
  });

  console.log(result.finalOutput);
  return result;
}

// Example usage
await research('What are the latest trends in web scraping regulation in 2026?');

该 agent 会自主完成：

调用 search_web 查找相关文章（5 credits）
对排名靠前的结果调用 extract_content（每次 2 credits）
综合生成带引用的摘要

步骤 3：添加结构化数据提取

构建一个从网页中提取特定字段的数据提取 agent：

Typescript

// agents/extractor.ts
import { Agent, run } from '@openai/agents-sdk';
import { scrapeStructuredTool, fetchUrlTool } from '../lib/crawlforge-tools';

const extractorAgent = new Agent({
  name: 'Data Extractor',
  model: 'gpt-4o',
  instructions: `You are a data extraction specialist. When given a URL and a data request:
1. Determine the best CSS selectors to extract the requested data
2. Use scrape_structured to pull the data
3. Return the results in clean JSON format

For simple JSON APIs, use fetch_url instead (it costs 1 credit vs 2).`,
  tools: [scrapeStructuredTool, fetchUrlTool],
});

async function extractData(url: string, description: string) {
  const result = await run(extractorAgent, {
    messages: [{
      role: 'user',
      content: `Extract from ${url}: ${description}`,
    }],
  });

  return result.finalOutput;
}

// Extract pricing data
await extractData(
  'https://stripe.com/pricing',
  'All plan names, prices, and key features'
);

进阶：多 agent 网络流水线

Agents SDK 支持在专用 agent 之间进行移交（handoff）。构建一个流水线，让研究员 agent 查找来源，然后移交给分析师 agent：

Typescript

// agents/pipeline.ts
import { Agent, run } from '@openai/agents-sdk';
import {
  searchWebTool,
  extractContentTool,
  scrapeStructuredTool,
} from '../lib/crawlforge-tools';

const collectorAgent = new Agent({
  name: 'Data Collector',
  model: 'gpt-4o',
  instructions: `You collect raw data from the web. Search for sources,
extract their content, and pass the raw data to the analyst.
Focus on gathering data, not analyzing it.`,
  tools: [searchWebTool, extractContentTool, scrapeStructuredTool],
  handoff_description: 'Collects raw web data for analysis',
});

const analystAgent = new Agent({
  name: 'Data Analyst',
  model: 'gpt-4o',
  instructions: `You analyze data collected by the Data Collector.
Identify patterns, compare data points, and produce actionable insights.
Always structure your output with clear sections and data tables.`,
  tools: [], // No web tools needed -- works with collected data
  handoffs: [collectorAgent], // Can request more data if needed
});

const orchestrator = new Agent({
  name: 'Research Orchestrator',
  model: 'gpt-4o',
  instructions: `You manage research projects. Delegate data collection to
the Data Collector and analysis to the Data Analyst. Ensure the final
output answers the user's question completely.`,
  handoffs: [collectorAgent, analystAgent],
});

// Run the multi-agent pipeline
const result = await run(orchestrator, {
  messages: [{
    role: 'user',
    content: 'Compare the pricing and features of the top 3 web scraping APIs in 2026',
  }],
});

console.log(result.finalOutput);

这个流水线实现了关注点分离：collector 负责采集数据（消耗 CrawlForge credits），analyst 负责处理数据（无需 credits）。总成本取决于获取的来源数量——一次 3 个来源的比较通常需要 15-25 credits。

credits 成本明细

Agent 工作流	使用的工具	预估 credits
单次搜索 + 摘要	search_web + extract_content	7
3 个来源的研究	search_web + 3x extract_content	11
结构化提取（1 个页面）	scrape_structured	2
多 agent 比较（3 个来源）	search_web + 3x extract_content + scrape_structured	15
深度研究报告	deep_research	10

CrawlForge 的 Free 套餐（1,000 credits）每月大约可支持 90 次“搜索并提取”工作流。Professional 套餐（$99/月，50,000 credits）可应对生产环境的 agent 负载。

最佳实践

优先选择成本最低的工具。 agent 的指令应引导它在可接受完整 HTML 时使用 fetch_url（1 credit），仅在需要干净文本时才使用 extract_content（2 credits）。把 deep_research（10 credits）留给复杂的多源查询。

限制 agent 步骤数。 设置工具调用的最大次数来控制成本。大多数研究任务在 3-5 次工具调用内即可完成。

用移交处理复杂流水线。 与其用一个挂满工具的 agent，不如拆分职责。collector agent 处理网络访问（消耗 credits），analyst agent 处理数据（无需 credits）。

缓存工具输出。 如果你的 agent 反复访问同一个 URL，请实现响应缓存，以避免重复的 credits 扣费。

监控用量。 在 CrawlForge 控制台中查看你的 credits 消耗，并为异常激增设置告警。

后续步骤

现在你已经拥有可以访问实时网页数据的 OpenAI agent。继续构建：

20 个 CrawlForge 工具概览——为你的 agent 注册更多工具
Stealth mode 抓取——访问带有反爬虫保护的网站
深度研究自动化——使用 10 credits 的 deep_research 工具生成全面报告
CrawlForge 快速上手——完整的 MCP 配置指南

让你的 OpenAI agent 拥有洞察网络的眼睛。 免费开始，获得 1,000 credits，无需信用卡。

本指南将向你展示如何把 CrawlForge 工具定义为 OpenAI agent 函数，并构建基于实时网页数据采取行动的 agent。

前置条件

Bash

pip install openai-agents
# or for the TypeScript/Node.js SDK:
npm install @openai/agents-sdk dotenv

Bash

# .env
OPENAI_API_KEY=sk-xxxxx
CRAWLFORGE_API_KEY=cf_live_xxxxx

在 crawlforge.dev/signup 获取你的 CrawlForge API key——包含 1,000 个免费 credits。

架构：CrawlForge + OpenAI agent

User Query -> OpenAI Agent -> Tool Selection -> CrawlForge API -> Results -> Agent Response

CrawlForge 位于 https://crawlforge.dev/api/v1/tools/ 的 REST API 可以干净地映射到 Agents SDK 的工具定义格式。每个工具都成为 agent 可以调用的一个函数。

步骤 1：创建 CrawlForge 工具函数

首先，创建一个可复用的 CrawlForge 客户端和工具定义：

Typescript

// lib/crawlforge-tools.ts
import { tool } from '@openai/agents-sdk';
import { z } from 'zod';

const CRAWLFORGE_BASE = 'https://crawlforge.dev/api/v1/tools';

async function callCrawlForge(toolName: string, params: Record<string, unknown>) {
  const response = await fetch(`${CRAWLFORGE_BASE}/${toolName}`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CRAWLFORGE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(params),
  });

  if (!response.ok) {
    throw new Error(`CrawlForge error: ${response.status} ${response.statusText}`);
  }

  return response.json();
}

// Search the web (5 credits)
export const searchWebTool = tool({
  name: 'search_web',
  description: 'Search Google and return top results with titles, URLs, and snippets.',
  parameters: z.object({
    query: z.string().describe('Search query'),
    limit: z.number().default(5).describe('Max results to return'),
  }),
  execute: async ({ query, limit }) => {
    return callCrawlForge('search_web', { query, limit });
  },
});

// Extract page content (2 credits)
export const extractContentTool = tool({
  name: 'extract_content',
  description: 'Extract the main readable content from a web page URL.',
  parameters: z.object({
    url: z.string().url().describe('Full URL to extract content from'),
  }),
  execute: async ({ url }) => {
    return callCrawlForge('extract_content', { url });
  },
});

// Structured scraping (2 credits)
export const scrapeStructuredTool = tool({
  name: 'scrape_structured',
  description: 'Extract structured data from a web page using CSS selectors.',
  parameters: z.object({
    url: z.string().url().describe('URL to scrape'),
    selectors: z.record(z.string()).describe('Map of field names to CSS selectors'),
  }),
  execute: async ({ url, selectors }) => {
    return callCrawlForge('scrape_structured', { url, selectors });
  },
});

// Fetch raw URL (1 credit)
export const fetchUrlTool = tool({
  name: 'fetch_url',
  description: 'Fetch raw HTML content from a URL. Cheapest option for simple retrieval.',
  parameters: z.object({
    url: z.string().url().describe('URL to fetch'),
  }),
  execute: async ({ url }) => {
    return callCrawlForge('fetch_url', { url });
  },
});

步骤 2：构建一个网络研究 agent

创建一个使用 CrawlForge 工具来研究主题的 agent：

Typescript

// agents/researcher.ts
import { Agent, run } from '@openai/agents-sdk';
import {
  searchWebTool,
  extractContentTool,
  scrapeStructuredTool,
} from '../lib/crawlforge-tools';

const researchAgent = new Agent({
  name: 'Web Researcher',
  model: 'gpt-4o',
  instructions: `You are a thorough web researcher. When asked about a topic:
1. Search the web for relevant, recent sources
2. Read the top 2-3 results to gather comprehensive information
3. Synthesize findings into a clear, cited summary
4. Always mention the URLs you sourced data from

Use search_web to find sources, then extract_content to read them.
Prefer extract_content over fetch_url when you need readable text.`,
  tools: [searchWebTool, extractContentTool, scrapeStructuredTool],
});

// Run the agent
async function research(topic: string) {
  const result = await run(researchAgent, {
    messages: [{ role: 'user', content: topic }],
  });

  console.log(result.finalOutput);
  return result;
}

// Example usage
await research('What are the latest trends in web scraping regulation in 2026?');

该 agent 会自主完成：

调用 search_web 查找相关文章（5 credits）
对排名靠前的结果调用 extract_content（每次 2 credits）
综合生成带引用的摘要

步骤 3：添加结构化数据提取

构建一个从网页中提取特定字段的数据提取 agent：

Typescript

// agents/extractor.ts
import { Agent, run } from '@openai/agents-sdk';
import { scrapeStructuredTool, fetchUrlTool } from '../lib/crawlforge-tools';

const extractorAgent = new Agent({
  name: 'Data Extractor',
  model: 'gpt-4o',
  instructions: `You are a data extraction specialist. When given a URL and a data request:
1. Determine the best CSS selectors to extract the requested data
2. Use scrape_structured to pull the data
3. Return the results in clean JSON format

For simple JSON APIs, use fetch_url instead (it costs 1 credit vs 2).`,
  tools: [scrapeStructuredTool, fetchUrlTool],
});

async function extractData(url: string, description: string) {
  const result = await run(extractorAgent, {
    messages: [{
      role: 'user',
      content: `Extract from ${url}: ${description}`,
    }],
  });

  return result.finalOutput;
}

// Extract pricing data
await extractData(
  'https://stripe.com/pricing',
  'All plan names, prices, and key features'
);

进阶：多 agent 网络流水线

Agents SDK 支持在专用 agent 之间进行移交（handoff）。构建一个流水线，让研究员 agent 查找来源，然后移交给分析师 agent：

Typescript

// agents/pipeline.ts
import { Agent, run } from '@openai/agents-sdk';
import {
  searchWebTool,
  extractContentTool,
  scrapeStructuredTool,
} from '../lib/crawlforge-tools';

const collectorAgent = new Agent({
  name: 'Data Collector',
  model: 'gpt-4o',
  instructions: `You collect raw data from the web. Search for sources,
extract their content, and pass the raw data to the analyst.
Focus on gathering data, not analyzing it.`,
  tools: [searchWebTool, extractContentTool, scrapeStructuredTool],
  handoff_description: 'Collects raw web data for analysis',
});

const analystAgent = new Agent({
  name: 'Data Analyst',
  model: 'gpt-4o',
  instructions: `You analyze data collected by the Data Collector.
Identify patterns, compare data points, and produce actionable insights.
Always structure your output with clear sections and data tables.`,
  tools: [], // No web tools needed -- works with collected data
  handoffs: [collectorAgent], // Can request more data if needed
});

const orchestrator = new Agent({
  name: 'Research Orchestrator',
  model: 'gpt-4o',
  instructions: `You manage research projects. Delegate data collection to
the Data Collector and analysis to the Data Analyst. Ensure the final
output answers the user's question completely.`,
  handoffs: [collectorAgent, analystAgent],
});

// Run the multi-agent pipeline
const result = await run(orchestrator, {
  messages: [{
    role: 'user',
    content: 'Compare the pricing and features of the top 3 web scraping APIs in 2026',
  }],
});

console.log(result.finalOutput);

credits 成本明细

Agent 工作流	使用的工具	预估 credits
单次搜索 + 摘要	search_web + extract_content	7
3 个来源的研究	search_web + 3x extract_content	11
结构化提取（1 个页面）	scrape_structured	2
多 agent 比较（3 个来源）	search_web + 3x extract_content + scrape_structured	15
深度研究报告	deep_research	10

CrawlForge 的 Free 套餐（1,000 credits）每月大约可支持 90 次“搜索并提取”工作流。Professional 套餐（$99/月，50,000 credits）可应对生产环境的 agent 负载。

最佳实践

限制 agent 步骤数。 设置工具调用的最大次数来控制成本。大多数研究任务在 3-5 次工具调用内即可完成。

缓存工具输出。 如果你的 agent 反复访问同一个 URL，请实现响应缓存，以避免重复的 credits 扣费。

监控用量。 在 CrawlForge 控制台中查看你的 credits 消耗，并为异常激增设置告警。

后续步骤

现在你已经拥有可以访问实时网页数据的 OpenAI agent。继续构建：

20 个 CrawlForge 工具概览——为你的 agent 注册更多工具
Stealth mode 抓取——访问带有反爬虫保护的网站
深度研究自动化——使用 10 credits 的 deep_research 工具生成全面报告
CrawlForge 快速上手——完整的 MCP 配置指南

让你的 OpenAI agent 拥有洞察网络的眼睛。 免费开始，获得 1,000 credits，无需信用卡。

本页内容

目录

前置条件

架构：CrawlForge + OpenAI agent

步骤 1：创建 CrawlForge 工具函数

步骤 2：构建一个网络研究 agent

步骤 3：添加结构化数据提取

进阶：多 agent 网络流水线

credits 成本明细

最佳实践

后续步骤

亲自试一试——无需注册

标签

关于作者

CrawlForge Team

及时获取最新洞察

相关文章

如何在 LangGraph 智能体中使用 CrawlForge

如何将 CrawlForge 与 Mastra AI agent 配合使用

如何用 MCP connectors 为 ChatGPT 添加 web scraping（2026）

本页内容

目录

前置条件

架构：CrawlForge + OpenAI agent

步骤 1：创建 CrawlForge 工具函数

步骤 2：构建一个网络研究 agent

步骤 3：添加结构化数据提取

进阶：多 agent 网络流水线

credits 成本明细

最佳实践

后续步骤

亲自试一试——无需注册

标签

关于作者

CrawlForge Team

及时获取最新洞察

相关文章

如何在 LangGraph 智能体中使用 CrawlForge

如何将 CrawlForge 与 Mastra AI agent 配合使用

如何用 MCP connectors 为 ChatGPT 添加 web scraping（2026）