LangChain
LangChain Integration
Integrate CrawlForge MCP with LangChain to build powerful AI agents with web scraping capabilities. Use as a document loader, tool, or custom retrieval chain.
Use Cases
Document Loaders
Load web pages as documents for vector stores and RAG applications
AI Agents
Give agents web scraping tools to fetch real-time data
Retrieval Chains
Build custom chains that fetch and process web content
Research Pipelines
Create automated research workflows with deep_research tool
Installation
Install LangChain and the CrawlForge MCP adapter.
Bash
npm install langchain @langchain/core @langchain/community
npm install @crawlforge/langchain-adapterYou'll also need a CrawlForge API key from the dashboard.
Document Loader
Use CrawlForge as a document loader to fetch web pages for RAG applications.
Typescript
import { CrawlForgeLoader } from '@crawlforge/langchain-adapter';
// Initialize the loader
const loader = new CrawlForgeLoader({
apiKey: process.env.CRAWLFORGE_API_KEY!,
tool: 'extract_text', // or 'fetch_url', 'extract_content'
});
// Load a single document
const docs = await loader.load('https://example.com');
console.log(docs[0].pageContent); // Clean text content
console.log(docs[0].metadata); // URL, title, credits used
// Load multiple documents
const urls = [
'https://example.com/page1',
'https://example.com/page2',
'https://example.com/page3'
];
const allDocs = await loader.loadMany(urls);
console.log(`Loaded ${allDocs.length} documents`);Best Practice: Use
extract_text for clean content or extract_content for article extraction.RAG Pipeline with Vector Store
Build a complete RAG pipeline with CrawlForge document loader and vector store.
Typescript
import { CrawlForgeLoader } from '@crawlforge/langchain-adapter';
import { OpenAIEmbeddings } from '@langchain/openai';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { RetrievalQAChain } from 'langchain/chains';
import { ChatOpenAI } from '@langchain/openai';
// 1. Load documents from web pages
const loader = new CrawlForgeLoader({
apiKey: process.env.CRAWLFORGE_API_KEY!,
tool: 'extract_content'
});
const docs = await loader.loadMany([
'https://example.com/doc1',
'https://example.com/doc2',
'https://example.com/doc3'
]);
// 2. Create embeddings and vector store
const embeddings = new OpenAIEmbeddings();
const vectorStore = await MemoryVectorStore.fromDocuments(
docs,
embeddings
);
// 3. Create retrieval chain
const model = new ChatOpenAI({ modelName: 'gpt-4' });
const chain = RetrievalQAChain.fromLLM(
model,
vectorStore.asRetriever()
);
// 4. Query the knowledge base
const response = await chain.call({
query: 'What are the key points from these documents?'
});
console.log(response.text);Agent Tools
Give LangChain agents web scraping capabilities with CrawlForge tools.
Typescript
import { CrawlForgeTool } from '@crawlforge/langchain-adapter';
import { initializeAgentExecutorWithOptions } from 'langchain/agents';
import { ChatOpenAI } from '@langchain/openai';
// Create CrawlForge tools
const tools = [
new CrawlForgeTool({
name: 'web_search',
description: 'Search the web for information',
apiKey: process.env.CRAWLFORGE_API_KEY!,
tool: 'search_web'
}),
new CrawlForgeTool({
name: 'fetch_page',
description: 'Fetch and extract content from a URL',
apiKey: process.env.CRAWLFORGE_API_KEY!,
tool: 'extract_content'
}),
new CrawlForgeTool({
name: 'deep_research',
description: 'Perform comprehensive research on a topic',
apiKey: process.env.CRAWLFORGE_API_KEY!,
tool: 'deep_research'
})
];
// Initialize agent
const model = new ChatOpenAI({ modelName: 'gpt-4', temperature: 0 });
const executor = await initializeAgentExecutorWithOptions(
tools,
model,
{
agentType: 'openai-functions',
verbose: true
}
);
// Run agent
const result = await executor.call({
input: 'Research the latest developments in quantum computing'
});
console.log(result.output);Agent Tips: Use descriptive tool names and descriptions to help the LLM choose the right tool. Set
verbose=true to see agent reasoning.Custom Retrieval Chain
Build a custom chain that searches, fetches, and summarizes web content.
Typescript
import { CrawlForgeLoader } from '@crawlforge/langchain-adapter';
import { PromptTemplate } from '@langchain/core/prompts';
import { RunnableSequence } from '@langchain/core/runnables';
import { ChatOpenAI } from '@langchain/openai';
import { StringOutputParser } from '@langchain/core/output_parsers';
// Initialize CrawlForge loader
const loader = new CrawlForgeLoader({
apiKey: process.env.CRAWLFORGE_API_KEY!,
tool: 'deep_research'
});
// Create custom chain
const prompt = PromptTemplate.fromTemplate(
`Based on the following research, answer the question:\n\n{context}\n\nQuestion: {question}\n\nAnswer:`
);
const model = new ChatOpenAI({ modelName: 'gpt-4' });
const chain = RunnableSequence.from([
{
context: async (input) => {
const docs = await loader.load(input.question);
return docs[0].pageContent;
},
question: (input) => input.question
},
prompt,
model,
new StringOutputParser()
]);
// Run the chain
const result = await chain.invoke({
question: 'What are the latest AI safety research findings?'
});
console.log(result);Best Practices
- Choose the Right Tool — Use
extract_text(1 credit) for simple content,deep_research(10 credits) for comprehensive analysis - Implement Caching — Cache fetched documents to avoid redundant API calls and save credits
- Handle Rate Limits — Implement retry logic with exponential backoff for production applications
- Monitor Credit Usage — Check document metadata for credit usage and set up alerts in your dashboard
Ready to build with LangChain?
Explore all 23 CrawlForge tools or check out other integrations.