CrawlForge
HomeUse CasesIntegrationsPricingDocumentationBlog
Building an AI Research Assistant with Claude and MCP
Use Cases
Back to Blog
Use Cases

Building an AI Research Assistant with Claude and MCP

C
CrawlForge Team
Engineering Team
January 2, 2026
12 min read
Updated April 14, 2026

On this page

Imagine an AI research assistant that can:

  • Search the web for relevant sources
  • Extract and verify information from multiple websites
  • Cross-reference facts for accuracy
  • Synthesize findings into a coherent summary with citations

With Claude, the Model Context Protocol (MCP), and CrawlForge, you can build this in an afternoon. This guide walks you through the architecture, implementation, and production considerations.

The Vision: Research Like a Human

Traditional LLMs are limited to their training data. When you ask GPT-4 or Claude a question, they can only recall what they've seen before. But humans don't work that way—we search, read, verify, and synthesize new information.

An AI research assistant should:

  1. Understand intent - Break down complex queries into searchable topics
  2. Discover sources - Find relevant web pages, documentation, articles
  3. Extract information - Pull out key facts, quotes, and data
  4. Verify accuracy - Cross-check information across multiple sources
  5. Synthesize results - Combine findings into a clear, cited answer

Let's build it.

Architecture Overview

Our research assistant has three layers:

┌─────────────────────────────────────────────────┐ │ LLM Layer (Claude/GPT-4) │ │ - Query understanding │ │ - Source relevance scoring │ │ - Information synthesis │ └─────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────┐ │ MCP Server (CrawlForge) │ │ - search_web (5 credits) │ │ - extract_content (2 credits) │ │ - deep_research (10 credits) │ └─────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────┐ │ Web Data Layer │ │ - Google Search results │ │ - Website content │ │ - Structured data │ └─────────────────────────────────────────────────┘

Data Flow:

  1. User submits research query
  2. LLM expands query into search terms
  3. CrawlForge searches the web and extracts content
  4. LLM verifies and synthesizes information
  5. Return structured answer with citations

Setting Up the Project

We'll use TypeScript, Claude's API (or OpenAI), and CrawlForge MCP server.

Prerequisites

Bash
node -v  # 18+ required
npm -v   # 9+ required

Initialize the Project

Bash
mkdir ai-research-assistant
cd ai-research-assistant
npm init -y
npm install @anthropic-ai/sdk dotenv
npm install --save-dev typescript @types/node tsx
npx tsc --init

Environment Setup

Create .env:

Bash
# Claude API (or use OPENAI_API_KEY)
ANTHROPIC_API_KEY=sk-ant-xxxxx

# CrawlForge API
CRAWLFORGE_API_KEY=cf_live_xxxxx

Get your CrawlForge API key at crawlforge.dev/signup (1,000 free credits).

Implementing the Research Flow

1. Query Understanding

First, we need to expand user queries into effective search terms.

Typescript
// src/research/query-processor.ts
import Anthropic from '@anthropic-ai/sdk';

interface QueryExpansion {
  original: string;
  searchTerms: string[];
  intent: 'factual' | 'comparative' | 'tutorial' | 'news';
  depth: 'shallow' | 'moderate' | 'deep';
}

export async function expandQuery(
  query: string,
  anthropic: Anthropic
): Promise<QueryExpansion> {
  const response = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 500,
    messages: [{
      role: 'user',
      content: `Analyze this research query and return JSON:
Query: "${query}"

Return:
{
  "searchTerms": ["term1", "term2", "term3"],
  "intent": "factual|comparative|tutorial|news",
  "depth": "shallow|moderate|deep"
}

Search terms should be optimized for web search.`
    }]
  });

  const content = response.content[0];
  if (content.type !== 'text') throw new Error('Unexpected response');

  const parsed = JSON.parse(content.text);

  return {
    original: query,
    searchTerms: parsed.searchTerms,
    intent: parsed.intent,
    depth: parsed.depth
  };
}

2. Web Search and Content Extraction

Next, we search for relevant sources and extract content.

Typescript
// src/research/web-scraper.ts
interface Source {
  url: string;
  title: string;
  snippet: string;
  content: string;
  extractedAt: Date;
}

export async function findSources(
  searchTerms: string[],
  apiKey: string
): Promise<Source[]> {
  const sources: Source[] = [];

  for (const term of searchTerms) {
    // Use search_web tool (5 credits per search)
    const searchResponse = await fetch('https://crawlforge.dev/api/v1/tools/search_web', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        query: term,
        limit: 5  // Top 5 results per term
      })
    });

    const searchData = await searchResponse.json();
    const results = searchData.results || [];

    // Extract content from each result (2 credits per URL)
    for (const result of results) {
      const contentResponse = await fetch('https://crawlforge.dev/api/v1/tools/extract_content', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${apiKey}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          url: result.url
        })
      });

      const contentData = await contentResponse.json();

      sources.push({
        url: result.url,
        title: result.title,
        snippet: result.snippet,
        content: contentData.content || '',
        extractedAt: new Date()
      });
    }
  }

  return sources;
}

Credit Cost:

  • 3 search terms × 5 credits = 15 credits
  • 15 sources × 2 credits = 30 credits
  • Total: 45 credits per research query

3. Information Verification

Cross-reference facts across sources to verify accuracy.

Typescript
// src/research/verifier.ts
interface VerifiedFact {
  claim: string;
  confidence: 'high' | 'medium' | 'low';
  sources: string[];
  conflicts?: string[];
}

export async function verifyInformation(
  sources: Source[],
  anthropic: Anthropic
): Promise<VerifiedFact[]> {
  const sourceTexts = sources.map((s, i) =>
    `[Source ${i + 1}: ${s.url}]
${s.content.slice(0, 1000)}`
  ).join('

');

  const response = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 2000,
    messages: [{
      role: 'user',
      content: `Extract and verify key facts from these sources. Return JSON:

${sourceTexts}

Return:
{
  "facts": [
    {
      "claim": "factual claim",
      "confidence": "high|medium|low",
      "sources": [1, 2],  // Source indices that support this
      "conflicts": ["conflicting information if any"]
    }
  ]
}`
    }]
  });

  const content = response.content[0];
  if (content.type !== 'text') throw new Error('Unexpected response');

  const parsed = JSON.parse(content.text);

  return parsed.facts.map((fact: any) => ({
    claim: fact.claim,
    confidence: fact.confidence,
    sources: fact.sources.map((i: number) => sources[i - 1]?.url || ''),
    conflicts: fact.conflicts
  }));
}

What's Next?

Now that you've built a basic research assistant, you can:

  1. Add streaming - Stream results as they're found for better UX
  2. Store results - Save research to a database for later retrieval
  3. Build a UI - Create a web interface with Next.js or React
  4. Add webhooks - Get notified when research completes
  5. Fine-tune prompts - Optimize for your specific use case

Resources

  • CrawlForge API Docs
  • Deep Research Tool
  • Credit Optimization Guide

Start building: Get 1,000 free credits at crawlforge.dev/signup.

Tags

AI ResearchMCPLLM ApplicationsData Extraction

About the Author

C

CrawlForge Team

Engineering Team

Building the most comprehensive web scraping MCP server. We create tools that help developers extract, analyze, and transform web data for AI applications.

On this page

Related Articles

Scrape Amazon, LinkedIn & 8 More Sites With One Tool
Use Cases

Scrape Amazon, LinkedIn & 8 More Sites With One Tool

scrape_template gives you pre-built, maintained scrapers for the 10 sites everyone wants. One call, structured JSON, 1 credit.

C
CrawlForge Team
|
May 27
|
8m
E-commerce Product Data Extraction at Scale
Use Cases

E-commerce Product Data Extraction at Scale

Extract product data from thousands of e-commerce pages with CrawlForge. Build catalogs, monitor inventory, and power comparison engines at scale.

C
CrawlForge Team
|
Apr 18
|
10m
Build a Research Agent with CrawlForge Deep Research
Use Cases

Build a Research Agent with CrawlForge Deep Research

Create an AI research agent that gathers, verifies, and synthesizes information from dozens of sources in minutes using CrawlForge deep_research.

C
CrawlForge Team
|
Apr 16
|
10m

Footer

CrawlForge

Enterprise web scraping for AI Agents. 23 specialized MCP tools designed for modern developers building intelligent systems.

Product

  • Features
  • Pricing
  • Use Cases
  • Integrations
  • Alternatives
  • Changelog

Resources

  • Getting Started
  • API Reference
  • Templates
  • Guides
  • Blog
  • Glossary
  • FAQ
  • Sitemap

Developers

  • MCP Protocol
  • Claude Desktop
  • Cursor IDE
  • LangChain
  • LlamaIndex

Company

  • About
  • Contact
  • Privacy
  • Terms

Stay updated

Get the latest updates on new tools and features.

Built with Next.js and MCP protocol

© 2025-2026 CrawlForge. All rights reserved.