CrawlForge
HomePlaygroundUse CasesIntegrationsPricingDocumentationBlog
How to Build a Web-Scraping MCP Server in TypeScript (2026)
Tutorials
Back to Blog
Tutorials

How to Build a Web-Scraping MCP Server in TypeScript (2026)

C
CrawlForge Team
Engineering Team
June 16, 2026
12 min read

On this page

Quick Answer

You can build a working web-scraping MCP server in TypeScript in under 50 lines with the official @modelcontextprotocol/sdk: create an McpServer, register a tool with a Zod input schema, fetch and parse the page with cheerio, and connect over stdio. The protocol is the easy part -- production scraping (JavaScript rendering, anti-bot bypass, proxies, retries) is why many teams point their agents at a managed scraping server like CrawlForge instead.

The Model Context Protocol (MCP) is how AI assistants like Claude call external tools. If you want your assistant to scrape the web, one option is to build your own MCP server. This guide walks through a minimal, working web-scraping MCP server in TypeScript -- and is honest about where the protocol ends and the hard part of scraping begins.

By the end you will have a server Claude can call, plus a clear sense of when building your own is the right move versus pointing your agent at a managed scraping server.

Table of Contents

  • What an MCP Server Actually Is
  • Prerequisites
  • Step 1: Project Setup
  • Step 2: A Minimal MCP Server
  • Step 3: Add a Real Scraping Tool
  • Step 4: Test With the MCP Inspector
  • Step 5: Connect It to Claude Desktop
  • The Hard Part: Production Scraping
  • Build vs. Buy
  • Gotchas

What an MCP Server Actually Is

An MCP server exposes three kinds of capability to an AI client:

  • Tools -- model-invoked functions that do work or cause side effects. A scrape_page tool is one.
  • Resources -- read-only data exposed by URI for context, no heavy computation.
  • Prompts -- reusable, user-triggered message templates.

For scraping, you want a tool. The client (Claude Desktop, Claude Code, Cursor) shows your tool to the model; when a prompt needs live data, the model emits a structured tool call, your server runs it, and the result flows back. For deeper protocol background, see MCP vs REST.

Prerequisites

  • Node.js 18+ (check with node --version)
  • Basic TypeScript familiarity
  • The official SDK and a schema library:
Bash
npm install @modelcontextprotocol/sdk zod

This tutorial targets @modelcontextprotocol/sdk v1.x (currently 1.29.x), which the maintainers recommend for production. A v2 line that splits into @modelcontextprotocol/server and @modelcontextprotocol/client is in pre-alpha; once it stabilizes the imports change, but the concepts below carry over.

Step 1: Project Setup

Create a project and mark it as an ES module -- the SDK is ESM-only:

Json
// package.json
{
  "name": "scraper-mcp",
  "version": "1.0.0",
  "type": "module",
  "bin": { "scraper-mcp": "build/index.js" },
  "scripts": { "build": "tsc" }
}

A minimal tsconfig.json targeting ES2022:

Json
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "build",
    "strict": true
  },
  "include": ["src/**/*.ts"]
}

Step 2: A Minimal MCP Server

Here is the smallest server that registers one tool and talks to a client over stdio. The .js extensions on the imports are required under ESM:

Typescript
// src/index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({ name: "scraper", version: "1.0.0" });

server.registerTool(
  "scrape_page",
  {
    title: "Scrape Page",
    description: "Fetch a URL and return its raw HTML",
    inputSchema: { url: z.string().url() }, // raw Zod shape, not z.object(...)
  },
  async ({ url }) => {
    const res = await fetch(url);
    const html = await res.text();
    return { content: [{ type: "text", text: html }] };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

That is a complete MCP server. The current API is server.registerTool(name, config, handler); the inputSchema is a raw Zod shape ({ url: z.string() }), not a wrapped z.object(). Build it with npm run build.

Step 3: Add a Real Scraping Tool

Returning raw HTML is rarely useful -- you want clean text. Add an HTML parser:

Bash
npm install cheerio
Typescript
import * as cheerio from "cheerio";

server.registerTool(
  "scrape_text",
  {
    title: "Scrape Visible Text",
    description: "Fetch a URL and return readable text, stripped of scripts and chrome",
    inputSchema: { url: z.string().url() },
  },
  async ({ url }) => {
    const res = await fetch(url, { headers: { "User-Agent": "scraper-mcp/1.0" } });
    if (!res.ok) {
      return {
        content: [{ type: "text", text: "Request failed with status " + res.status }],
        isError: true,
      };
    }
    const $ = cheerio.load(await res.text());
    $("script, style, nav, footer").remove();
    const text = $("body").text().replace(/\s+/g, " ").trim();
    return { content: [{ type: "text", text: text.slice(0, 8000) }] };
  }
);

Return isError: true on failures so the model can react instead of treating an error string as page content. cheerio parses static HTML server-side, fast -- but note what it cannot do, which we get to below.

Step 4: Test With the MCP Inspector

Before wiring anything into Claude, test in isolation with the MCP Inspector:

Bash
npx @modelcontextprotocol/inspector node build/index.js

It opens an interactive UI with Tools, Resources, and Prompts tabs. Select scrape_text, enter a URL, and run it -- you will see the response plus a live JSON-RPC log. You can also set environment variables per server here, which is how you would test an API key later.

Step 5: Connect It to Claude Desktop

Edit Claude Desktop's config (create the file if it does not exist):

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Add your server under the mcpServers key, using an absolute path to the built file:

Json
{
  "mcpServers": {
    "scraper": {
      "command": "node",
      "args": ["/ABSOLUTE/PATH/TO/build/index.js"],
      "env": { "SCRAPER_API_KEY": "optional" }
    }
  }
}

Quit Claude Desktop completely and reopen it. Your scrape_text tool now appears, and you can ask Claude in plain English to scrape a page.

The Hard Part: Production Scraping

The protocol code above is the easy 20%. Real-world scraping is the other 80%, and none of it is MCP-specific:

  • JavaScript rendering. cheerio only sees the initial HTML. Single-page apps render content client-side, so you need a headless browser like Playwright or Puppeteer -- which is slow, memory-hungry, and a deployment headache.
  • Anti-bot systems. Cloudflare, WAFs, CAPTCHAs, and browser fingerprinting block naive HTTP clients. Getting past them is a continuous arms race.
  • Proxies. Avoiding IP bans at any volume means rotating residential or datacenter proxy pools.
  • Rate limits, retries, and backoff. Polite, resilient crawling needs concurrency control and exponential backoff.
  • Parsing brittleness. Selectors break the moment a target site changes its markup.

Build vs. Buy

Build your own MCP server to learn the protocol, or for simple, static, well-behaved internal sites. Once you hit JavaScript rendering, anti-bot defenses, and proxy rotation, the maintenance burden dwarfs the protocol code -- which is exactly why most teams point their agents at a managed scraping MCP server instead.

That is the niche CrawlForge fills: 26 ready-made tools including scrape_with_actions (headless browser), stealth_mode (anti-bot), batch_scrape, and deep_research, with JS rendering, proxies, and rate limiting handled for you. You install it the same way you would your own server -- see adding web scraping to Claude Desktop -- but skip the browser-and-proxy treadmill. For how the managed options stack up, see the best MCP servers for web scraping.

Gotchas

  • Never log to stdout on a stdio server. stdout carries the JSON-RPC stream, so a stray console.log corrupts it and breaks the server. Log to stderr with console.error, or to a file.
  • Use absolute paths in claude_desktop_config.json -- relative paths fail.
  • Stay ESM: "type": "module" in package.json and .js extensions on SDK imports, with target: ES2022.
  • Restart the client after any config change. Use Node.js 18 or newer.
  • Prefer v1.x of the SDK for production; treat v2 as experimental for now.
  • Python instead? The official Python SDK (the mcp package, with FastMCP and an @mcp.tool() decorator) speaks the same protocol and needs Python 3.10+.

Skip the maintenance -- start free with CrawlForge and get 26 production scraping tools with 1,000 credits, no credit card.

Try this yourself — no signup needed

Run any of CrawlForge's 27 scraping and extraction tools in the playground, then start free with 1,000 credits.

1,000 free credits • Refills monthly • No credit card required

Tags

mcptutorialtypescriptmcp-serverweb-scrapingclaude

About the Author

C

CrawlForge Team

Engineering Team

Building the most comprehensive web scraping MCP server. We create tools that help developers extract, analyze, and transform web data for AI applications.

Stay updated with the latest insights

Get tutorials, product updates, and web scraping tips delivered to your inbox.

No spam. Unsubscribe anytime.

Put this into practice

Test CrawlForge's tools on any URL — free, no signup.

On this page

Frequently Asked Questions

What is the difference between MCP tools, resources, and prompts?+

Tools are model-invoked functions that do work or cause side effects (a scrape_page tool). Resources are read-only data exposed by URI for context. Prompts are reusable, user-triggered message templates. A scraping server almost always exposes its capability as a tool.

How do I test an MCP server locally?+

Use the MCP Inspector: run npx @modelcontextprotocol/inspector node build/index.js. It opens an interactive UI where you can list tools, set inputs and environment variables, call your tool, and watch the JSON-RPC log -- without wiring it into a full client first.

Why does my stdio MCP server hang or return broken responses?+

The most common cause is logging to stdout. A stdio server uses stdout for the JSON-RPC protocol stream, so any console.log corrupts it. Log to stderr (console.error) or to a file instead. Also use absolute paths in your client config and make sure the project is ESM (type: module).

Should I build my own scraper or use a managed scraping API?+

Build your own to learn MCP or for simple, static, well-behaved sites. Reach for a managed scraping server (CrawlForge, Firecrawl, and others) once you hit JavaScript rendering, anti-bot systems, proxy rotation, and rate limiting -- the maintenance burden of those is far larger than the protocol code.

Which @modelcontextprotocol/sdk version should I use?+

Use the v1.x line (currently 1.29.x) -- the maintainers recommend it for production. A v2 line that splits into separate server and client packages is in pre-alpha; once it stabilizes the import paths change, but the McpServer, registerTool, and transport concepts carry over.

Can I build an MCP server in Python instead of TypeScript?+

Yes. The official Python SDK (the mcp package, with FastMCP and an @mcp.tool() decorator) implements the same protocol and needs Python 3.10 or newer. Pick the language that matches your stack -- the MCP concepts are identical.

Related Articles

How to Use CrawlForge with LangGraph Agents
Tutorials

How to Use CrawlForge with LangGraph Agents

Build stateful web scraping agents with LangGraph and CrawlForge. TypeScript guide covering graph nodes, state management, and conditional scraping flows.

C
CrawlForge Team
|
Apr 24
|
8m
How to Use CrawlForge with Mastra AI Agents
Tutorials

How to Use CrawlForge with Mastra AI Agents

Build AI agents with web scraping capabilities using Mastra and CrawlForge. TypeScript setup guide with tool integration, workflows, and agent examples.

C
CrawlForge Team
|
Apr 21
|
7m
How to Give ChatGPT Web Scraping with MCP Connectors (2026)
Tutorials

How to Give ChatGPT Web Scraping with MCP Connectors (2026)

Connect ChatGPT to web scraping with custom MCP connectors -- why CrawlForge needs a remote wrapper, the FastMCP bridge to build, and how to add it in ChatGPT.

C
CrawlForge Team
|
Jun 16
|
11m

Footer

CrawlForge

Enterprise web scraping for AI Agents. 27 specialized MCP tools designed for modern developers building intelligent systems.

Product

  • Features
  • Playground
  • Pricing
  • Use Cases
  • Integrations
  • Alternatives
  • Changelog

Resources

  • Getting Started
  • API Reference
  • Templates
  • Guides
  • Blog
  • Glossary
  • FAQ
  • Sitemap

Developers

  • MCP Protocol
  • Claude Desktop
  • Cursor IDE
  • LangChain
  • LlamaIndex

Company

  • About
  • Contact
  • Privacy
  • Terms
  • Acceptable Use
  • Cookies

Stay updated

Get the latest updates on new tools and features.

Built with Next.js and MCP protocol

© 2025-2026 CrawlForge. All rights reserved.