What is the difference between MCP tools, resources, and prompts?

Tools are model-invoked functions that do work or cause side effects (a scrape_page tool). Resources are read-only data exposed by URI for context. Prompts are reusable, user-triggered message templates. A scraping server almost always exposes its capability as a tool.

How do I test an MCP server locally?

Use the MCP Inspector: run npx @modelcontextprotocol/inspector node build/index.js. It opens an interactive UI where you can list tools, set inputs and environment variables, call your tool, and watch the JSON-RPC log -- without wiring it into a full client first.

Why does my stdio MCP server hang or return broken responses?

The most common cause is logging to stdout. A stdio server uses stdout for the JSON-RPC protocol stream, so any console.log corrupts it. Log to stderr (console.error) or to a file instead. Also use absolute paths in your client config and make sure the project is ESM (type: module).

Should I build my own scraper or use a managed scraping API?

Build your own to learn MCP or for simple, static, well-behaved sites. Reach for a managed scraping server (CrawlForge, Firecrawl, and others) once you hit JavaScript rendering, anti-bot systems, proxy rotation, and rate limiting -- the maintenance burden of those is far larger than the protocol code.

Which @modelcontextprotocol/sdk version should I use?

Use the v1.x line (currently 1.29.x) -- the maintainers recommend it for production. A v2 line that splits into separate server and client packages is in pre-alpha; once it stabilizes the import paths change, but the McpServer, registerTool, and transport concepts carry over.

Can I build an MCP server in Python instead of TypeScript?

Yes. The official Python SDK (the mcp package, with FastMCP and an @mcp.tool() decorator) implements the same protocol and needs Python 3.10 or newer. Pick the language that matches your stack -- the MCP concepts are identical.

How to Build a Web-Scraping MCP Server in TypeScript (2026)

The Model Context Protocol (MCP) is how AI assistants like Claude call external tools. If you want your assistant to scrape the web, one option is to build your own MCP server. This guide walks through a minimal, working web-scraping MCP server in TypeScript -- and is honest about where the protocol ends and the hard part of scraping begins.

By the end you will have a server Claude can call, plus a clear sense of when building your own is the right move versus pointing your agent at a managed scraping server.

What an MCP Server Actually Is
Prerequisites
Step 1: Project Setup
Step 2: A Minimal MCP Server
Step 3: Add a Real Scraping Tool
Step 4: Test With the MCP Inspector
Step 5: Connect It to Claude Desktop
The Hard Part: Production Scraping
Build vs. Buy
Gotchas

What an MCP Server Actually Is

An MCP server exposes three kinds of capability to an AI client:

Tools -- model-invoked functions that do work or cause side effects. A scrape_page tool is one.
Resources -- read-only data exposed by URI for context, no heavy computation.
Prompts -- reusable, user-triggered message templates.

For scraping, you want a tool. The client (Claude Desktop, Claude Code, Cursor) shows your tool to the model; when a prompt needs live data, the model emits a structured tool call, your server runs it, and the result flows back. For deeper protocol background, see MCP vs REST.

Prerequisites

Node.js 18+ (check with node --version)
Basic TypeScript familiarity
The official SDK and a schema library:

Bash

npm install @modelcontextprotocol/sdk zod

This tutorial targets @modelcontextprotocol/sdk v1.x (currently 1.29.x), which the maintainers recommend for production. A v2 line that splits into @modelcontextprotocol/server and @modelcontextprotocol/client is in pre-alpha; once it stabilizes the imports change, but the concepts below carry over.

Step 1: Project Setup

Create a project and mark it as an ES module -- the SDK is ESM-only:

Json

// package.json
{
  "name": "scraper-mcp",
  "version": "1.0.0",
  "type": "module",
  "bin": { "scraper-mcp": "build/index.js" },
  "scripts": { "build": "tsc" }
}

A minimal tsconfig.json targeting ES2022:

Json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "build",
    "strict": true
  },
  "include": ["src/**/*.ts"]
}

Step 2: A Minimal MCP Server

Here is the smallest server that registers one tool and talks to a client over stdio. The .js extensions on the imports are required under ESM:

Typescript

// src/index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({ name: "scraper", version: "1.0.0" });

server.registerTool(
  "scrape_page",
  {
    title: "Scrape Page",
    description: "Fetch a URL and return its raw HTML",
    inputSchema: { url: z.string().url() }, // raw Zod shape, not z.object(...)
  },
  async ({ url }) => {
    const res = await fetch(url);
    const html = await res.text();
    return { content: [{ type: "text", text: html }] };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

That is a complete MCP server. The current API is server.registerTool(name, config, handler); the inputSchema is a raw Zod shape ({ url: z.string() }), not a wrapped z.object(). Build it with npm run build.

Step 3: Add a Real Scraping Tool

Returning raw HTML is rarely useful -- you want clean text. Add an HTML parser:

Bash

npm install cheerio

Typescript

import * as cheerio from "cheerio";

server.registerTool(
  "scrape_text",
  {
    title: "Scrape Visible Text",
    description: "Fetch a URL and return readable text, stripped of scripts and chrome",
    inputSchema: { url: z.string().url() },
  },
  async ({ url }) => {
    const res = await fetch(url, { headers: { "User-Agent": "scraper-mcp/1.0" } });
    if (!res.ok) {
      return {
        content: [{ type: "text", text: "Request failed with status " + res.status }],
        isError: true,
      };
    }
    const $ = cheerio.load(await res.text());
    $("script, style, nav, footer").remove();
    const text = $("body").text().replace(/\s+/g, " ").trim();
    return { content: [{ type: "text", text: text.slice(0, 8000) }] };
  }
);

Return isError: true on failures so the model can react instead of treating an error string as page content. cheerio parses static HTML server-side, fast -- but note what it cannot do, which we get to below.

Step 4: Test With the MCP Inspector

Before wiring anything into Claude, test in isolation with the MCP Inspector:

Bash

npx @modelcontextprotocol/inspector node build/index.js

It opens an interactive UI with Tools, Resources, and Prompts tabs. Select scrape_text, enter a URL, and run it -- you will see the response plus a live JSON-RPC log. You can also set environment variables per server here, which is how you would test an API key later.

Step 5: Connect It to Claude Desktop

Edit Claude Desktop's config (create the file if it does not exist):

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Add your server under the mcpServers key, using an absolute path to the built file:

Json

{
  "mcpServers": {
    "scraper": {
      "command": "node",
      "args": ["/ABSOLUTE/PATH/TO/build/index.js"],
      "env": { "SCRAPER_API_KEY": "optional" }
    }
  }
}

Quit Claude Desktop completely and reopen it. Your scrape_text tool now appears, and you can ask Claude in plain English to scrape a page.

The Hard Part: Production Scraping

The protocol code above is the easy 20%. Real-world scraping is the other 80%, and none of it is MCP-specific:

JavaScript rendering. cheerio only sees the initial HTML. Single-page apps render content client-side, so you need a headless browser like Playwright or Puppeteer -- which is slow, memory-hungry, and a deployment headache.
Anti-bot systems. Cloudflare, WAFs, CAPTCHAs, and browser fingerprinting block naive HTTP clients. Getting past them is a continuous arms race.
Proxies. Avoiding IP bans at any volume means rotating residential or datacenter proxy pools.
Rate limits, retries, and backoff. Polite, resilient crawling needs concurrency control and exponential backoff.
Parsing brittleness. Selectors break the moment a target site changes its markup.

Build vs. Buy

Build your own MCP server to learn the protocol, or for simple, static, well-behaved internal sites. Once you hit JavaScript rendering, anti-bot defenses, and proxy rotation, the maintenance burden dwarfs the protocol code -- which is exactly why most teams point their agents at a managed scraping MCP server instead.

That is the niche CrawlForge fills: 26 ready-made tools including scrape_with_actions (headless browser), stealth_mode (anti-bot), batch_scrape, and deep_research, with JS rendering, proxies, and rate limiting handled for you. You install it the same way you would your own server -- see adding web scraping to Claude Desktop -- but skip the browser-and-proxy treadmill. For how the managed options stack up, see the best MCP servers for web scraping.

Gotchas

Never log to stdout on a stdio server. stdout carries the JSON-RPC stream, so a stray console.log corrupts it and breaks the server. Log to stderr with console.error, or to a file.
Use absolute paths in claude_desktop_config.json -- relative paths fail.
Stay ESM: "type": "module" in package.json and .js extensions on SDK imports, with target: ES2022.
Restart the client after any config change. Use Node.js 18 or newer.
Prefer v1.x of the SDK for production; treat v2 as experimental for now.
Python instead? The official Python SDK (the mcp package, with FastMCP and an @mcp.tool() decorator) speaks the same protocol and needs Python 3.10+.

Skip the maintenance -- start free with CrawlForge and get 26 production scraping tools with 1,000 credits, no credit card.

By the end you will have a server Claude can call, plus a clear sense of when building your own is the right move versus pointing your agent at a managed scraping server.

What an MCP Server Actually Is
Prerequisites
Step 1: Project Setup
Step 2: A Minimal MCP Server
Step 3: Add a Real Scraping Tool
Step 4: Test With the MCP Inspector
Step 5: Connect It to Claude Desktop
The Hard Part: Production Scraping
Build vs. Buy
Gotchas

What an MCP Server Actually Is

An MCP server exposes three kinds of capability to an AI client:

Tools -- model-invoked functions that do work or cause side effects. A scrape_page tool is one.
Resources -- read-only data exposed by URI for context, no heavy computation.
Prompts -- reusable, user-triggered message templates.

Prerequisites

Node.js 18+ (check with node --version)
Basic TypeScript familiarity
The official SDK and a schema library:

Bash

npm install @modelcontextprotocol/sdk zod

Step 1: Project Setup

Create a project and mark it as an ES module -- the SDK is ESM-only:

Json

// package.json
{
  "name": "scraper-mcp",
  "version": "1.0.0",
  "type": "module",
  "bin": { "scraper-mcp": "build/index.js" },
  "scripts": { "build": "tsc" }
}

A minimal tsconfig.json targeting ES2022:

Json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "build",
    "strict": true
  },
  "include": ["src/**/*.ts"]
}

Step 2: A Minimal MCP Server

Here is the smallest server that registers one tool and talks to a client over stdio. The .js extensions on the imports are required under ESM:

Typescript

// src/index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({ name: "scraper", version: "1.0.0" });

server.registerTool(
  "scrape_page",
  {
    title: "Scrape Page",
    description: "Fetch a URL and return its raw HTML",
    inputSchema: { url: z.string().url() }, // raw Zod shape, not z.object(...)
  },
  async ({ url }) => {
    const res = await fetch(url);
    const html = await res.text();
    return { content: [{ type: "text", text: html }] };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

Step 3: Add a Real Scraping Tool

Returning raw HTML is rarely useful -- you want clean text. Add an HTML parser:

Bash

npm install cheerio

Typescript

import * as cheerio from "cheerio";

server.registerTool(
  "scrape_text",
  {
    title: "Scrape Visible Text",
    description: "Fetch a URL and return readable text, stripped of scripts and chrome",
    inputSchema: { url: z.string().url() },
  },
  async ({ url }) => {
    const res = await fetch(url, { headers: { "User-Agent": "scraper-mcp/1.0" } });
    if (!res.ok) {
      return {
        content: [{ type: "text", text: "Request failed with status " + res.status }],
        isError: true,
      };
    }
    const $ = cheerio.load(await res.text());
    $("script, style, nav, footer").remove();
    const text = $("body").text().replace(/\s+/g, " ").trim();
    return { content: [{ type: "text", text: text.slice(0, 8000) }] };
  }
);

Step 4: Test With the MCP Inspector

Before wiring anything into Claude, test in isolation with the MCP Inspector:

Bash

npx @modelcontextprotocol/inspector node build/index.js

Step 5: Connect It to Claude Desktop

Edit Claude Desktop's config (create the file if it does not exist):

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Add your server under the mcpServers key, using an absolute path to the built file:

Json

{
  "mcpServers": {
    "scraper": {
      "command": "node",
      "args": ["/ABSOLUTE/PATH/TO/build/index.js"],
      "env": { "SCRAPER_API_KEY": "optional" }
    }
  }
}

Quit Claude Desktop completely and reopen it. Your scrape_text tool now appears, and you can ask Claude in plain English to scrape a page.

The Hard Part: Production Scraping

The protocol code above is the easy 20%. Real-world scraping is the other 80%, and none of it is MCP-specific:

JavaScript rendering. cheerio only sees the initial HTML. Single-page apps render content client-side, so you need a headless browser like Playwright or Puppeteer -- which is slow, memory-hungry, and a deployment headache.
Anti-bot systems. Cloudflare, WAFs, CAPTCHAs, and browser fingerprinting block naive HTTP clients. Getting past them is a continuous arms race.
Proxies. Avoiding IP bans at any volume means rotating residential or datacenter proxy pools.
Rate limits, retries, and backoff. Polite, resilient crawling needs concurrency control and exponential backoff.
Parsing brittleness. Selectors break the moment a target site changes its markup.

Build vs. Buy

Gotchas

Never log to stdout on a stdio server. stdout carries the JSON-RPC stream, so a stray console.log corrupts it and breaks the server. Log to stderr with console.error, or to a file.
Use absolute paths in claude_desktop_config.json -- relative paths fail.
Stay ESM: "type": "module" in package.json and .js extensions on SDK imports, with target: ES2022.
Restart the client after any config change. Use Node.js 18 or newer.
Prefer v1.x of the SDK for production; treat v2 as experimental for now.
Python instead? The official Python SDK (the mcp package, with FastMCP and an @mcp.tool() decorator) speaks the same protocol and needs Python 3.10+.

Skip the maintenance -- start free with CrawlForge and get 26 production scraping tools with 1,000 credits, no credit card.

On this page

Table of Contents

What an MCP Server Actually Is

Prerequisites

Step 1: Project Setup

Step 2: A Minimal MCP Server

Step 3: Add a Real Scraping Tool

Step 4: Test With the MCP Inspector

Step 5: Connect It to Claude Desktop

The Hard Part: Production Scraping

Build vs. Buy

Gotchas

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Frequently Asked Questions

Related Articles

How to Use CrawlForge with LangGraph Agents

How to Use CrawlForge with Mastra AI Agents

How to Give ChatGPT Web Scraping with MCP Connectors (2026)

On this page

Table of Contents

What an MCP Server Actually Is

Prerequisites

Step 1: Project Setup

Step 2: A Minimal MCP Server

Step 3: Add a Real Scraping Tool

Step 4: Test With the MCP Inspector

Step 5: Connect It to Claude Desktop

The Hard Part: Production Scraping

Build vs. Buy

Gotchas

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Frequently Asked Questions

Related Articles

How to Use CrawlForge with LangGraph Agents

How to Use CrawlForge with Mastra AI Agents

How to Give ChatGPT Web Scraping with MCP Connectors (2026)