LangGraph is LangChain's framework for building stateful, graph-based AI agents. By integrating CrawlForge tools as graph nodes, you can build agents that make intelligent decisions about what to scrape, when to dig deeper, and how to synthesize web data across multiple steps.

This guide shows you how to build a complete scraping agent with LangGraph and CrawlForge in TypeScript.

What Is LangGraph?
Prerequisites
Step 1: Project Setup
Step 2: Define CrawlForge Tools for LangGraph
Step 3: Design the Agent State
Step 4: Build Graph Nodes
Step 5: Wire the Graph Together
Step 6: Run the Agent
Credit Cost Reference
LangGraph vs Direct LangChain for Scraping
Next Steps

What Is LangGraph?

LangGraph is a low-level orchestration framework for building reliable AI agents. Unlike simple chain-based architectures, LangGraph models agent logic as a directed graph where:

Nodes represent actions (tool calls, LLM invocations, data processing)
Edges define transitions between nodes, including conditional routing
State persists across the entire graph execution

This architecture is ideal for scraping agents because web scraping inherently involves decisions: Should I scrape deeper? Is this page blocked? Do I need to switch to stealth mode? LangGraph lets you model these decisions as conditional edges in a graph.

Prerequisites

Node.js 18+ and TypeScript 5+
A CrawlForge account with an API key (1,000 free credits)
Familiarity with LangChain basics

Step 1: Project Setup

Bash

mkdir langgraph-scraper && cd langgraph-scraper
npm init -y
npm install @langchain/langgraph @langchain/anthropic @langchain/core zod dotenv
npm install -D typescript @types/node tsx

Create tsconfig.json:

Json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true,
    "outDir": "dist"
  },
  "include": ["src/**/*"]
}

Add your API keys to .env:

Bash

CRAWLFORGE_API_KEY=cf_live_your_key_here
ANTHROPIC_API_KEY=sk-ant-your_key_here

Step 2: Define CrawlForge Tools for LangGraph

Create typed tool wrappers that LangGraph can invoke:

Typescript

// src/tools.ts
import { tool } from '@langchain/core/tools';
import { z } from 'zod';

const CRAWLFORGE_API = 'https://crawlforge.dev/api/v1/tools';

async function callCrawlForge(
  endpoint: string,
  params: Record<string, unknown>
): Promise<string> {
  const response = await fetch(`${CRAWLFORGE_API}/${endpoint}`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.CRAWLFORGE_API_KEY}`,
    },
    body: JSON.stringify(params),
  });

  if (!response.ok) {
    return JSON.stringify({ error: `HTTP ${response.status}`, endpoint });
  }

  const data = await response.json();
  return JSON.stringify(data);
}

export const searchWebTool = tool(
  async ({ query, limit }) => {
    return callCrawlForge('search_web', { query, limit });
  },
  {
    name: 'search_web',
    description: 'Search the web for information. Costs 5 credits. Use when you need to find URLs for a topic.',
    schema: z.object({
      query: z.string().describe('Search query'),
      limit: z.number().default(5).describe('Max results'),
    }),
  }
);

export const extractContentTool = tool(
  async ({ url }) => {
    return callCrawlForge('extract_content', { url });
  },
  {
    name: 'extract_content',
    description: 'Extract clean readable content from a URL. Costs 2 credits.',
    schema: z.object({
      url: z.string().describe('URL to extract content from'),
    }),
  }
);

export const scrapeStructuredTool = tool(
  async ({ url, selectors }) => {
    return callCrawlForge('scrape_structured', { url, selectors });
  },
  {
    name: 'scrape_structured',
    description: 'Extract structured data using CSS selectors. Costs 2 credits.',
    schema: z.object({
      url: z.string().describe('URL to scrape'),
      selectors: z.record(z.string()).describe('CSS selectors map'),
    }),
  }
);

export const fetchUrlTool = tool(
  async ({ url }) => {
    return callCrawlForge('fetch_url', { url });
  },
  {
    name: 'fetch_url',
    description: 'Fetch raw HTML from a URL. Cheapest option at 1 credit.',
    schema: z.object({
      url: z.string().describe('URL to fetch'),
    }),
  }
);

export const allTools = [
  searchWebTool,
  extractContentTool,
  scrapeStructuredTool,
  fetchUrlTool,
];

Step 3: Design the Agent State

LangGraph agents maintain state across graph execution. Define a state shape that tracks scraping progress:

Typescript

// src/state.ts
import { BaseMessage } from '@langchain/core/messages';
import { Annotation } from '@langchain/langgraph';

// Define the graph state
export const AgentState = Annotation.Root({
  // Conversation messages (LLM context)
  messages: Annotation<BaseMessage[]>({
    reducer: (prev, next) => [...prev, ...next],
    default: () => [],
  }),

  // URLs discovered during research
  discoveredUrls: Annotation<string[]>({
    reducer: (prev, next) => [...new Set([...prev, ...next])],
    default: () => [],
  }),

  // Content extracted from URLs
  extractedContent: Annotation<Record<string, string>>({
    reducer: (prev, next) => ({ ...prev, ...next }),
    default: () => ({}),
  }),

  // Total credits consumed
  creditsUsed: Annotation<number>({
    reducer: (prev, next) => prev + next,
    default: () => 0,
  }),

  // Current phase of the scraping pipeline
  phase: Annotation<'search' | 'extract' | 'analyze' | 'complete'>({
    reducer: (_prev, next) => next,
    default: () => 'search' as const,
  }),
});

Step 4: Build Graph Nodes

Each node in the graph performs a specific action and updates state:

Typescript

// src/nodes.ts
import { ChatAnthropic } from '@langchain/anthropic';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { ToolNode } from '@langchain/langgraph/prebuilt';
import { AgentState } from './state';
import { allTools } from './tools';

const model = new ChatAnthropic({
  model: 'claude-sonnet-4-20250514',
  temperature: 0,
}).bindTools(allTools);

// Node: LLM decides which tool to call next
export async function agentNode(
  state: typeof AgentState.State
) {
  const systemPrompt = new SystemMessage(
    `You are a web research agent. Your goal is to find and extract information efficiently.
    Always prefer cheaper tools: fetch_url (1cr) > extract_content (2cr) > search_web (5cr).
    Track credits used. Stop when you have enough information or reach 20 credits.`
  );

  const response = await model.invoke([systemPrompt, ...state.messages]);

  return { messages: [response] };
}

// Node: Execute tool calls
export const toolNode = new ToolNode(allTools);

// Node: Analyze extracted content and decide next step
export async function analyzeNode(
  state: typeof AgentState.State
) {
  const extractedCount = Object.keys(state.extractedContent).length;

  if (extractedCount >= 3 || state.creditsUsed >= 20) {
    return { phase: 'complete' as const };
  }

  return { phase: 'extract' as const };
}

Step 5: Wire the Graph Together

Connect nodes with edges and conditional routing:

Typescript

// src/graph.ts
import { StateGraph, END } from '@langchain/langgraph';
import { AgentState } from './state';
import { agentNode, toolNode, analyzeNode } from './nodes';
import { AIMessage } from '@langchain/core/messages';

// Determine if the agent wants to use a tool or is finished
function shouldContinue(state: typeof AgentState.State) {
  const lastMessage = state.messages[state.messages.length - 1];

  // If the LLM returned tool calls, route to tool execution
  if (
    lastMessage instanceof AIMessage &&
    lastMessage.tool_calls &&
    lastMessage.tool_calls.length > 0
  ) {
    return 'tools';
  }

  // Otherwise, analyze what we have
  return 'analyze';
}

// Determine if we should continue scraping or wrap up
function shouldFinish(state: typeof AgentState.State) {
  if (state.phase === 'complete') {
    return 'end';
  }
  return 'agent';
}

// Build the graph
const workflow = new StateGraph(AgentState)
  // Add nodes
  .addNode('agent', agentNode)
  .addNode('tools', toolNode)
  .addNode('analyze', analyzeNode)

  // Set entry point
  .addEdge('__start__', 'agent')

  // Agent -> tools (if tool call) or analyze (if no tool call)
  .addConditionalEdges('agent', shouldContinue, {
    tools: 'tools',
    analyze: 'analyze',
  })

  // Tools -> agent (return results to LLM)
  .addEdge('tools', 'agent')

  // Analyze -> agent (continue) or end (done)
  .addConditionalEdges('analyze', shouldFinish, {
    agent: 'agent',
    end: END,
  });

export const app = workflow.compile();

Step 6: Run the Agent

Typescript

// src/index.ts
import 'dotenv/config';
import { HumanMessage } from '@langchain/core/messages';
import { app } from './graph';

async function main() {
  const result = await app.invoke({
    messages: [
      new HumanMessage(
        'Research the top 3 MCP server implementations for web scraping. ' +
        'Find their websites, extract their key features, and compare pricing.'
      ),
    ],
  });

  // Print final state
  console.log('--- Research Complete ---');
  console.log('Credits used:', result.creditsUsed);
  console.log('URLs discovered:', result.discoveredUrls.length);
  console.log('Pages extracted:', Object.keys(result.extractedContent).length);
  console.log('\nFinal response:');
  console.log(result.messages[result.messages.length - 1].content);
}

main().catch(console.error);

Run it:

Bash

npx tsx src/index.ts

The agent will search the web, discover relevant pages, extract content from the most promising results, and synthesize a comparison -- all while tracking credit usage in the graph state.

Credit Cost Reference

Credits	Tools	LangGraph Node Role
1	fetch_url, extract_text, extract_links, extract_metadata	Lightweight data-gathering nodes
2	scrape_structured, extract_content, map_site, process_document, localization	Extraction, discovery, and document processing nodes
3	track_changes, analyze_content	Change-tracking and analysis nodes
4	summarize_content, crawl_deep	Summary and multi-page crawling nodes
5	search_web, batch_scrape, scrape_with_actions, stealth_mode	Research and bulk-operation nodes
10	deep_research	Comprehensive analysis (use as a single-node subgraph)

Typical LangGraph agent run: 5 (search) + 6 (3 extractions) + 0 (LLM analysis) = 11 credits.

LangGraph vs Direct LangChain for Scraping

Aspect	LangGraph	Direct LangChain
State Management	Built-in, typed, persistent	Manual, requires custom code
Conditional Logic	First-class conditional edges	If/else in chain functions
Credit Tracking	Track in graph state automatically	Manual counter
Error Recovery	Route errors to fallback nodes	Try/catch in chain
Complexity	Higher initial setup	Simpler for linear workflows
Best For	Multi-step research with branching logic	Simple fetch-and-process pipelines

Use LangGraph when your scraping agent needs to make decisions based on intermediate results. Use direct LangChain (see our LangChain integration guide) when the workflow is linear.

Next Steps

LangGraph Documentation -- official LangGraph guides
5 Ways to Use CrawlForge with LangChain -- simpler LangChain patterns
Build a Research Assistant -- related agent architecture
CrawlForge API Reference -- full tool endpoint documentation

Build intelligent scraping agents today. Sign up for CrawlForge with 1,000 free credits, wire the tools into your LangGraph graph, and let your agent decide what to scrape next.

This guide shows you how to build a complete scraping agent with LangGraph and CrawlForge in TypeScript.

What Is LangGraph?
Prerequisites
Step 1: Project Setup
Step 2: Define CrawlForge Tools for LangGraph
Step 3: Design the Agent State
Step 4: Build Graph Nodes
Step 5: Wire the Graph Together
Step 6: Run the Agent
Credit Cost Reference
LangGraph vs Direct LangChain for Scraping
Next Steps

What Is LangGraph?

LangGraph is a low-level orchestration framework for building reliable AI agents. Unlike simple chain-based architectures, LangGraph models agent logic as a directed graph where:

Nodes represent actions (tool calls, LLM invocations, data processing)
Edges define transitions between nodes, including conditional routing
State persists across the entire graph execution

Prerequisites

Node.js 18+ and TypeScript 5+
A CrawlForge account with an API key (1,000 free credits)
Familiarity with LangChain basics

Step 1: Project Setup

Bash

mkdir langgraph-scraper && cd langgraph-scraper
npm init -y
npm install @langchain/langgraph @langchain/anthropic @langchain/core zod dotenv
npm install -D typescript @types/node tsx

Create tsconfig.json:

Json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true,
    "outDir": "dist"
  },
  "include": ["src/**/*"]
}

Add your API keys to .env:

Bash

CRAWLFORGE_API_KEY=cf_live_your_key_here
ANTHROPIC_API_KEY=sk-ant-your_key_here

Step 2: Define CrawlForge Tools for LangGraph

Create typed tool wrappers that LangGraph can invoke:

Typescript

// src/tools.ts
import { tool } from '@langchain/core/tools';
import { z } from 'zod';

const CRAWLFORGE_API = 'https://crawlforge.dev/api/v1/tools';

async function callCrawlForge(
  endpoint: string,
  params: Record<string, unknown>
): Promise<string> {
  const response = await fetch(`${CRAWLFORGE_API}/${endpoint}`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.CRAWLFORGE_API_KEY}`,
    },
    body: JSON.stringify(params),
  });

  if (!response.ok) {
    return JSON.stringify({ error: `HTTP ${response.status}`, endpoint });
  }

  const data = await response.json();
  return JSON.stringify(data);
}

export const searchWebTool = tool(
  async ({ query, limit }) => {
    return callCrawlForge('search_web', { query, limit });
  },
  {
    name: 'search_web',
    description: 'Search the web for information. Costs 5 credits. Use when you need to find URLs for a topic.',
    schema: z.object({
      query: z.string().describe('Search query'),
      limit: z.number().default(5).describe('Max results'),
    }),
  }
);

export const extractContentTool = tool(
  async ({ url }) => {
    return callCrawlForge('extract_content', { url });
  },
  {
    name: 'extract_content',
    description: 'Extract clean readable content from a URL. Costs 2 credits.',
    schema: z.object({
      url: z.string().describe('URL to extract content from'),
    }),
  }
);

export const scrapeStructuredTool = tool(
  async ({ url, selectors }) => {
    return callCrawlForge('scrape_structured', { url, selectors });
  },
  {
    name: 'scrape_structured',
    description: 'Extract structured data using CSS selectors. Costs 2 credits.',
    schema: z.object({
      url: z.string().describe('URL to scrape'),
      selectors: z.record(z.string()).describe('CSS selectors map'),
    }),
  }
);

export const fetchUrlTool = tool(
  async ({ url }) => {
    return callCrawlForge('fetch_url', { url });
  },
  {
    name: 'fetch_url',
    description: 'Fetch raw HTML from a URL. Cheapest option at 1 credit.',
    schema: z.object({
      url: z.string().describe('URL to fetch'),
    }),
  }
);

export const allTools = [
  searchWebTool,
  extractContentTool,
  scrapeStructuredTool,
  fetchUrlTool,
];

Step 3: Design the Agent State

LangGraph agents maintain state across graph execution. Define a state shape that tracks scraping progress:

Typescript

// src/state.ts
import { BaseMessage } from '@langchain/core/messages';
import { Annotation } from '@langchain/langgraph';

// Define the graph state
export const AgentState = Annotation.Root({
  // Conversation messages (LLM context)
  messages: Annotation<BaseMessage[]>({
    reducer: (prev, next) => [...prev, ...next],
    default: () => [],
  }),

  // URLs discovered during research
  discoveredUrls: Annotation<string[]>({
    reducer: (prev, next) => [...new Set([...prev, ...next])],
    default: () => [],
  }),

  // Content extracted from URLs
  extractedContent: Annotation<Record<string, string>>({
    reducer: (prev, next) => ({ ...prev, ...next }),
    default: () => ({}),
  }),

  // Total credits consumed
  creditsUsed: Annotation<number>({
    reducer: (prev, next) => prev + next,
    default: () => 0,
  }),

  // Current phase of the scraping pipeline
  phase: Annotation<'search' | 'extract' | 'analyze' | 'complete'>({
    reducer: (_prev, next) => next,
    default: () => 'search' as const,
  }),
});

Step 4: Build Graph Nodes

Each node in the graph performs a specific action and updates state:

Typescript

// src/nodes.ts
import { ChatAnthropic } from '@langchain/anthropic';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { ToolNode } from '@langchain/langgraph/prebuilt';
import { AgentState } from './state';
import { allTools } from './tools';

const model = new ChatAnthropic({
  model: 'claude-sonnet-4-20250514',
  temperature: 0,
}).bindTools(allTools);

// Node: LLM decides which tool to call next
export async function agentNode(
  state: typeof AgentState.State
) {
  const systemPrompt = new SystemMessage(
    `You are a web research agent. Your goal is to find and extract information efficiently.
    Always prefer cheaper tools: fetch_url (1cr) > extract_content (2cr) > search_web (5cr).
    Track credits used. Stop when you have enough information or reach 20 credits.`
  );

  const response = await model.invoke([systemPrompt, ...state.messages]);

  return { messages: [response] };
}

// Node: Execute tool calls
export const toolNode = new ToolNode(allTools);

// Node: Analyze extracted content and decide next step
export async function analyzeNode(
  state: typeof AgentState.State
) {
  const extractedCount = Object.keys(state.extractedContent).length;

  if (extractedCount >= 3 || state.creditsUsed >= 20) {
    return { phase: 'complete' as const };
  }

  return { phase: 'extract' as const };
}

Step 5: Wire the Graph Together

Connect nodes with edges and conditional routing:

Typescript

// src/graph.ts
import { StateGraph, END } from '@langchain/langgraph';
import { AgentState } from './state';
import { agentNode, toolNode, analyzeNode } from './nodes';
import { AIMessage } from '@langchain/core/messages';

// Determine if the agent wants to use a tool or is finished
function shouldContinue(state: typeof AgentState.State) {
  const lastMessage = state.messages[state.messages.length - 1];

  // If the LLM returned tool calls, route to tool execution
  if (
    lastMessage instanceof AIMessage &&
    lastMessage.tool_calls &&
    lastMessage.tool_calls.length > 0
  ) {
    return 'tools';
  }

  // Otherwise, analyze what we have
  return 'analyze';
}

// Determine if we should continue scraping or wrap up
function shouldFinish(state: typeof AgentState.State) {
  if (state.phase === 'complete') {
    return 'end';
  }
  return 'agent';
}

// Build the graph
const workflow = new StateGraph(AgentState)
  // Add nodes
  .addNode('agent', agentNode)
  .addNode('tools', toolNode)
  .addNode('analyze', analyzeNode)

  // Set entry point
  .addEdge('__start__', 'agent')

  // Agent -> tools (if tool call) or analyze (if no tool call)
  .addConditionalEdges('agent', shouldContinue, {
    tools: 'tools',
    analyze: 'analyze',
  })

  // Tools -> agent (return results to LLM)
  .addEdge('tools', 'agent')

  // Analyze -> agent (continue) or end (done)
  .addConditionalEdges('analyze', shouldFinish, {
    agent: 'agent',
    end: END,
  });

export const app = workflow.compile();

Step 6: Run the Agent

Typescript

// src/index.ts
import 'dotenv/config';
import { HumanMessage } from '@langchain/core/messages';
import { app } from './graph';

async function main() {
  const result = await app.invoke({
    messages: [
      new HumanMessage(
        'Research the top 3 MCP server implementations for web scraping. ' +
        'Find their websites, extract their key features, and compare pricing.'
      ),
    ],
  });

  // Print final state
  console.log('--- Research Complete ---');
  console.log('Credits used:', result.creditsUsed);
  console.log('URLs discovered:', result.discoveredUrls.length);
  console.log('Pages extracted:', Object.keys(result.extractedContent).length);
  console.log('\nFinal response:');
  console.log(result.messages[result.messages.length - 1].content);
}

main().catch(console.error);

Run it:

Bash

npx tsx src/index.ts

The agent will search the web, discover relevant pages, extract content from the most promising results, and synthesize a comparison -- all while tracking credit usage in the graph state.

Credit Cost Reference

Credits	Tools	LangGraph Node Role
1	fetch_url, extract_text, extract_links, extract_metadata	Lightweight data-gathering nodes
2	scrape_structured, extract_content, map_site, process_document, localization	Extraction, discovery, and document processing nodes
3	track_changes, analyze_content	Change-tracking and analysis nodes
4	summarize_content, crawl_deep	Summary and multi-page crawling nodes
5	search_web, batch_scrape, scrape_with_actions, stealth_mode	Research and bulk-operation nodes
10	deep_research	Comprehensive analysis (use as a single-node subgraph)

Typical LangGraph agent run: 5 (search) + 6 (3 extractions) + 0 (LLM analysis) = 11 credits.

LangGraph vs Direct LangChain for Scraping

Aspect	LangGraph	Direct LangChain
State Management	Built-in, typed, persistent	Manual, requires custom code
Conditional Logic	First-class conditional edges	If/else in chain functions
Credit Tracking	Track in graph state automatically	Manual counter
Error Recovery	Route errors to fallback nodes	Try/catch in chain
Complexity	Higher initial setup	Simpler for linear workflows
Best For	Multi-step research with branching logic	Simple fetch-and-process pipelines

Use LangGraph when your scraping agent needs to make decisions based on intermediate results. Use direct LangChain (see our LangChain integration guide) when the workflow is linear.

Next Steps

LangGraph Documentation -- official LangGraph guides
5 Ways to Use CrawlForge with LangChain -- simpler LangChain patterns
Build a Research Assistant -- related agent architecture
CrawlForge API Reference -- full tool endpoint documentation

Build intelligent scraping agents today. Sign up for CrawlForge with 1,000 free credits, wire the tools into your LangGraph graph, and let your agent decide what to scrape next.

On this page

Table of Contents

What Is LangGraph?

Prerequisites

Step 1: Project Setup

Step 2: Define CrawlForge Tools for LangGraph

Step 3: Design the Agent State

Step 4: Build Graph Nodes

Step 5: Wire the Graph Together

Step 6: Run the Agent

Credit Cost Reference

LangGraph vs Direct LangChain for Scraping

Next Steps

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Related Articles

How to Use CrawlForge with Mastra AI Agents

How to Build a Web-Scraping MCP Server in TypeScript (2026)

How to Use CrawlForge with Dify Workflows

On this page

Table of Contents

What Is LangGraph?

Prerequisites

Step 1: Project Setup

Step 2: Define CrawlForge Tools for LangGraph

Step 3: Design the Agent State

Step 4: Build Graph Nodes

Step 5: Wire the Graph Together

Step 6: Run the Agent

Credit Cost Reference

LangGraph vs Direct LangChain for Scraping

Next Steps

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Related Articles

How to Use CrawlForge with Mastra AI Agents

How to Build a Web-Scraping MCP Server in TypeScript (2026)

How to Use CrawlForge with Dify Workflows