On this page
Imagine an AI research assistant that can:
- Search the web for relevant sources
- Extract and verify information from multiple websites
- Cross-reference facts for accuracy
- Synthesize findings into a coherent summary with citations
With Claude, the Model Context Protocol (MCP), and CrawlForge, you can build this in an afternoon. This guide walks you through the architecture, implementation, and production considerations.
The Vision: Research Like a Human
Traditional LLMs are limited to their training data. When you ask GPT-4 or Claude a question, they can only recall what they've seen before. But humans don't work that way—we search, read, verify, and synthesize new information.
An AI research assistant should:
- Understand intent - Break down complex queries into searchable topics
- Discover sources - Find relevant web pages, documentation, articles
- Extract information - Pull out key facts, quotes, and data
- Verify accuracy - Cross-check information across multiple sources
- Synthesize results - Combine findings into a clear, cited answer
Let's build it.
Architecture Overview
Our research assistant has three layers:
┌─────────────────────────────────────────────────┐
│ LLM Layer (Claude/GPT-4) │
│ - Query understanding │
│ - Source relevance scoring │
│ - Information synthesis │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ MCP Server (CrawlForge) │
│ - search_web (5 credits) │
│ - extract_content (2 credits) │
│ - deep_research (10 credits) │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Web Data Layer │
│ - Google Search results │
│ - Website content │
│ - Structured data │
└─────────────────────────────────────────────────┘
Data Flow:
- User submits research query
- LLM expands query into search terms
- CrawlForge searches the web and extracts content
- LLM verifies and synthesizes information
- Return structured answer with citations
Setting Up the Project
We'll use TypeScript, Claude's API (or OpenAI), and CrawlForge MCP server.
Prerequisites
node -v # 18+ required
npm -v # 9+ requiredInitialize the Project
mkdir ai-research-assistant
cd ai-research-assistant
npm init -y
npm install @anthropic-ai/sdk dotenv
npm install --save-dev typescript @types/node tsx
npx tsc --initEnvironment Setup
Create .env:
# Claude API (or use OPENAI_API_KEY)
ANTHROPIC_API_KEY=sk-ant-xxxxx
# CrawlForge API
CRAWLFORGE_API_KEY=cf_live_xxxxxGet your CrawlForge API key at crawlforge.dev/signup (1,000 free credits).
Implementing the Research Flow
1. Query Understanding
First, we need to expand user queries into effective search terms.
// src/research/query-processor.ts
import Anthropic from '@anthropic-ai/sdk';
interface QueryExpansion {
original: string;
searchTerms: string[];
intent: 'factual' | 'comparative' | 'tutorial' | 'news';
depth: 'shallow' | 'moderate' | 'deep';
}
export async function expandQuery(
query: string,
anthropic: Anthropic
): Promise<QueryExpansion> {
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 500,
messages: [{
role: 'user',
content: `Analyze this research query and return JSON:
Query: "${query}"
Return:
{
"searchTerms": ["term1", "term2", "term3"],
"intent": "factual|comparative|tutorial|news",
"depth": "shallow|moderate|deep"
}
Search terms should be optimized for web search.`
}]
});
const content = response.content[0];
if (content.type !== 'text') throw new Error('Unexpected response');
const parsed = JSON.parse(content.text);
return {
original: query,
searchTerms: parsed.searchTerms,
intent: parsed.intent,
depth: parsed.depth
};
}2. Web Search and Content Extraction
Next, we search for relevant sources and extract content.
// src/research/web-scraper.ts
interface Source {
url: string;
title: string;
snippet: string;
content: string;
extractedAt: Date;
}
export async function findSources(
searchTerms: string[],
apiKey: string
): Promise<Source[]> {
const sources: Source[] = [];
for (const term of searchTerms) {
// Use search_web tool (5 credits per search)
const searchResponse = await fetch('https://crawlforge.dev/api/v1/tools/search_web', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
query: term,
limit: 5 // Top 5 results per term
})
});
const searchData = await searchResponse.json();
const results = searchData.results || [];
// Extract content from each result (2 credits per URL)
for (const result of results) {
const contentResponse = await fetch('https://crawlforge.dev/api/v1/tools/extract_content', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: result.url
})
});
const contentData = await contentResponse.json();
sources.push({
url: result.url,
title: result.title,
snippet: result.snippet,
content: contentData.content || '',
extractedAt: new Date()
});
}
}
return sources;
}Credit Cost:
- 3 search terms × 5 credits = 15 credits
- 15 sources × 2 credits = 30 credits
- Total: 45 credits per research query
3. Information Verification
Cross-reference facts across sources to verify accuracy.
// src/research/verifier.ts
interface VerifiedFact {
claim: string;
confidence: 'high' | 'medium' | 'low';
sources: string[];
conflicts?: string[];
}
export async function verifyInformation(
sources: Source[],
anthropic: Anthropic
): Promise<VerifiedFact[]> {
const sourceTexts = sources.map((s, i) =>
`[Source ${i + 1}: ${s.url}]
${s.content.slice(0, 1000)}`
).join('
');
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 2000,
messages: [{
role: 'user',
content: `Extract and verify key facts from these sources. Return JSON:
${sourceTexts}
Return:
{
"facts": [
{
"claim": "factual claim",
"confidence": "high|medium|low",
"sources": [1, 2], // Source indices that support this
"conflicts": ["conflicting information if any"]
}
]
}`
}]
});
const content = response.content[0];
if (content.type !== 'text') throw new Error('Unexpected response');
const parsed = JSON.parse(content.text);
return parsed.facts.map((fact: any) => ({
claim: fact.claim,
confidence: fact.confidence,
sources: fact.sources.map((i: number) => sources[i - 1]?.url || ''),
conflicts: fact.conflicts
}));
}What's Next?
Now that you've built a basic research assistant, you can:
- Add streaming - Stream results as they're found for better UX
- Store results - Save research to a database for later retrieval
- Build a UI - Create a web interface with Next.js or React
- Add webhooks - Get notified when research completes
- Fine-tune prompts - Optimize for your specific use case
Resources
Start building: Get 1,000 free credits at crawlforge.dev/signup.