CrawlForge
Integration
Llamaindex
LlamaIndex15 Minutes

LlamaIndex Integration

Integrate CrawlForge MCP with LlamaIndex to build data connectors, indexes, and query engines with web scraping capabilities. Perfect for RAG applications and knowledge bases.

Use Cases

Web Data Connectors

Create data connectors that fetch and index web content automatically

Knowledge Bases

Build searchable knowledge bases from web pages and documents

Query Engines

Create query engines with real-time web data retrieval

Document Processing

Extract and process documents from URLs for indexing

Installation

Install LlamaIndex and the CrawlForge MCP adapter.

Bash
You'll also need a CrawlForge API key from the dashboard.

Web Data Connector

Use CrawlForge as a data connector to fetch and load web documents.

Typescript
Tip: Use extract_content for clean article extraction or extract_text for full page text.

Vector Store Index

Create a vector store index from web documents for semantic search.

Typescript

Query Engine with Tools

Create a query engine that can fetch real-time web data on demand.

Typescript
Agent Tips: The agent will automatically choose which tools to use based on the query. Set verbose=true to see tool selection.

Custom Web Retriever

Build a custom retriever that fetches web data based on queries.

Typescript

Batch Processing with Async

Process multiple URLs efficiently with async batch operations.

Typescript
Performance Tip: Use batch_scrape for processing multiple URLs—it's optimized for parallel execution and costs only 1 credit per URL.

Best Practices

Choose Efficient Tools

Use batch_scrape for multiple URLs, extract_content for clean text

Implement Caching

Cache indexed documents to avoid redundant fetches and save credits

Use Async Operations

Leverage async/await for parallel processing to speed up bulk operations

Monitor Credits

Track credit usage in document metadata and set up alerts in your dashboard

Ready to build with LlamaIndex?
Explore all 18 CrawlForge tools or check out other integrations.