CrawlForge
LlamaIndex

LlamaIndex Integration

Integrate CrawlForge MCP with LlamaIndex to build data connectors, indexes, and query engines with web scraping capabilities. Perfect for RAG applications and knowledge bases.

Use Cases

Web Data Connectors
Create data connectors that fetch and index web content automatically
Knowledge Bases
Build searchable knowledge bases from web pages and documents
Query Engines
Create query engines with real-time web data retrieval
Document Processing
Extract and process documents from URLs for indexing

Installation

Install LlamaIndex and the CrawlForge MCP adapter.

Bash
You'll also need a CrawlForge API key from the dashboard.

Web Data Connector

Use CrawlForge as a data connector to fetch and load web documents.

Typescript
Tip: Use extract_content for clean article extraction or extract_text for full page text.

Vector Store Index

Create a vector store index from web documents for semantic search.

Typescript

Query Engine with Tools

Create a query engine that can fetch real-time web data on demand.

Typescript
Agent Tips: The agent will automatically choose which tools to use based on the query. Set verbose=true to see tool selection.

Custom Web Retriever

Build a custom retriever that fetches web data based on queries.

Typescript

Batch Processing with Async

Process multiple URLs efficiently with async batch operations.

Typescript
Performance Tip: Use batch_scrape for processing multiple URLs—it's optimized for parallel execution and costs only 1 credit per URL.

Best Practices

  • Choose Efficient Tools — Use batch_scrape for multiple URLs, extract_content for clean text
  • Implement Caching — Cache indexed documents to avoid redundant fetches and save credits
  • Use Async Operations — Leverage async/await for parallel processing to speed up bulk operations
  • Monitor Credits — Track credit usage in document metadata and set up alerts in your dashboard
Ready to build with LlamaIndex?
Explore all 23 CrawlForge tools or check out other integrations.
View All ToolsLangChain Integration