On this page
If you have ever written a Claude Code skill and watched Claude blow right past it, v4.8.0 is for you. CrawlForge MCP Server v4.8.0 ships seven real, auto-activating Claude Agent Skills for web scraping that map plain-English prompts to the server's 26 tools — so Claude loads the right scraping, crawling, or research tool without you spelling out which one to call.
This is a fully additive minor release (shipped June 28, 2026). The tool count stays at 26, and no tool schema, output shape, or credit cost changes for existing callers. Alongside the skills, we wired in genuinely-enforced SSRF protection, fixed MCP confirmation prompts that were silently failing open, made the screenshot scrape format actually work, added a design-token branding format, and shipped real built-in scheduled change monitoring.
Table of contents
- What shipped in v4.8.0
- Seven auto-activating Claude Agent Skills for web scraping
- Two new scrape formats: branding and screenshot
- Built-in scheduled change monitoring
- Security hardening: controls that were advertised but silently broken
- Pricing: 26 metered tools, no new tools, no free lunch
- How to upgrade
- What is next
What shipped in v4.8.0
One-line summary: real auto-activating Claude Agent Skills, two new scrape formats, genuinely-enforced security controls, and working scheduled change monitoring.
There is exactly one behavior change to flag up front, Stripe-changelog style: clients that support MCP elicitation will now actually see the cost and safety confirmations (for example, deep research over 50 URLs, batch scrape, and deep crawl) that previously failed to appear. Everything else is purely additive.
Install or upgrade in one line:
npm install -g crawlforge-mcp-server@latest
npx crawlforge initHere is the scannable changelog:
| Type | Change |
|---|---|
| Added | 7 auto-activating Claude Agent Skills covering all 26 tools |
| Added | scrape format branding (design tokens, no browser) |
| Added | scrape format screenshot now renders (was a no-op) |
| Added | Scheduled change monitoring: create/list/stop + CLI cron |
| Fixed | SSRF protection wired into the live scraping path |
| Fixed | MCP elicitation confirmations now fire (were silent no-ops) |
| Security | Per-host outbound rate limiting + executeJavaScript hardening |
Seven auto-activating Claude Agent Skills for web scraping
The old approach shipped bare reference-markdown files that Claude Code never actually loaded. If you have ever dropped a skill file on disk and watched Claude ignore it, you know the failure mode: the file is there, but nothing tells the model when it is relevant, so you end up spelling out "use the stealth_mode tool to scrape this" — which defeats the point.
So we rebuilt them properly. A skill is now a directory containing a SKILL.md file with YAML frontmatter. At startup Claude pre-loads only the name and description of every installed skill, then reads the full body only when it judges the skill relevant to your prompt. Anthropic calls this progressive disclosure — skills are not always-loaded context, they are loaded on demand.
---
name: crawlforge-web-scraping
description: >-
Scrape, crawl, and extract content from websites and return clean
Markdown or structured JSON. Use when the user wants to scrape a page,
crawl a site, extract links or metadata, map a site, or convert a URL
to Markdown for an LLM.
---
# CrawlForge Web Scraping
...Honest framing: auto-activation is model-judged, not guaranteed. A good, trigger-rich description dramatically raises the probability the right skill fires, but it is a heuristic, not a contract — and you can always name the skill or tool explicitly in your prompt to force the issue.
The seven skills cover all 26 tools:
| Skill | Covers |
|---|---|
crawlforge-getting-started | Onboarding, key setup, tool selection |
crawlforge-web-scraping | scrape, crawl, map, extract links/metadata/text |
crawlforge-deep-research | deep research, search, summarize, analyze |
crawlforge-stealth-browsing | stealth mode, anti-bot, browser actions |
crawlforge-structured-extraction | LLM extraction, templates, structured scrape |
crawlforge-change-tracking | track changes, scheduled monitors |
crawlforge-batch-automation | batch scrape, document processing, llms.txt |
Skills install to the personal scope at ~/.claude/skills/<name>/SKILL.md. They complement MCP rather than replace it: MCP exposes the 26 tools, the skills teach Claude when and how to reach for them — think onboarding guide for a new hire who already has the tools on their desk.
Upgrades self-heal. The installer removes the legacy bare files (leaving unrelated skills untouched), and npm run skills:gen regenerates the root SKILL.md. There is also an opt-in forced-eval hook — an idempotent UserPromptSubmit reminder that raises auto-activation — behind --with-hook on install-skills and init (and --remove-hook on uninstall-skills). It is off by default.
npx crawlforge init does the whole flow: configures your API key, installs the skills, and registers the MCP server with your AI clients.
Two new scrape formats: branding and screenshot
The scrape tool gains two output formats. Cost is unchanged at 2 credits for both.
The new branding format does static design-token extraction from HTML and CSS with no browser required. It returns the color palette, fonts and typography, logo and favicons, and border-radius, shadow, and spacing tokens. It is SSRF-guarded, and linked-CSS fetches are both count- and size-capped.
{
"tool": "scrape",
"arguments": {
"url": "https://stripe.com",
"formats": ["branding"]
}
}The screenshot format now actually works — it was previously a no-op. It lazily renders via the shared browser pool and returns crawlforge://screenshot/{id} MCP resources. The browser launches only when a screenshot is requested, and if rendering fails it degrades to a warning so the rest of the scrape still succeeds (partial success preserved).
{
"tool": "scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown", "screenshot"]
}
}Built-in scheduled change monitoring
track_changes gains real scheduled operations: create_scheduled_monitor, stop_scheduled_monitor, and a new list_scheduled_monitors. These were previously dead code that threw on call. They are now backed by a real persisted scheduler (MonitorScheduler.js + MonitorStore.js), and baselines rehydrate from snapshots on restart.
You can attach an optional plain-English goal. It is LLM-judged (Ollama-first) and degrades gracefully to threshold significance when no LLM is available — so a docs page or an API changelog can return a real significance verdict, not just a diff.
{
"tool": "track_changes",
"arguments": {
"operation": "create_scheduled_monitor",
"url": "https://docs.example.com/changelog",
"interval": "1h",
"goal": "Tell me only when a breaking API change is announced"
}
}Now the honest part: a stdio MCP server is not a long-lived daemon, so reliable scheduled firing uses the CLI plus system cron. monitor:run-due is a one-shot that checks every due monitor and guarantees firing:
# Create and inspect monitors
crawlforge monitor:create --url https://docs.example.com/changelog --interval 1h
crawlforge monitor:list
# Drive due checks from system cron (every 15 minutes)
*/15 * * * * crawlforge monitor:run-duetrack_changes costs 3 credits per call.
Security hardening: controls that were advertised but silently broken
MCP servers have become a recognized attack surface, and a scraping server that fetches arbitrary URLs on your behalf is a textbook SSRF target — point it at the cloud-metadata endpoint (169.254.169.254) and an unguarded fetch will happily hand the response back. We audited our own posture and found two controls we advertised but were not actually enforcing. Both are fixed.
SSRF is now enforced on the live path
ssrfProtection.js existed but was never wired into the tools — every scrape used raw fetch() with no IP or host validation. The new ssrfGuard.js injects an undici dispatcher whose connect-time lookup validates every connection (the initial request and every redirect hop) and pins to the validated IP, closing the DNS-rebinding TOCTOU window.
Stage 1 (the default) blocks loopback, link-local and cloud-metadata (169.254.169.254), and 0.0.0.0. It is now routed through roughly 14 modules: the basic fetch path, batch scrape, map site, crawl, extract, document processing, research, llms.txt, robots/sitemap, and the change-tracking differ.
# Default: Stage 1 is on. Tighten or override as needed.
SSRF_STRICT=true # adds full RFC1918 / ULA private-range enforcement
ALLOWED_DOMAINS=internal.acme.dev # trusted-host bypass for known internal targets
SSRF_PROTECTION_ENABLED=false # kill switchMCP elicitation now actually fires
The old ElicitationHelper called server.elicit() — a method that does not exist — and never checked the client capability, so every cost and safety confirmation silently failed open. It is fixed to call elicitInput, gate on the client's elicitation capability, and parse the action field (accept / decline / cancel). It still fails open for clients that do not support elicitation, but elicitation-capable clients will now see the prompts.
Defense in depth
Per-host outbound rate limiting (hostRateLimiter.js) was added to the basic fetch path and batch scrape: a default of 10 requests/second per host, gated by RATE_LIMIT_PER_DOMAIN. There is no global cap, so broad multi-host crawls are unaffected. This is defense-in-depth, not an SSRF boundary itself.
Finally, executeJavaScript (still off by default) gained a max script length (JS_MAX_SCRIPT_LENGTH), an explicit execution timeout (JS_EXECUTION_TIMEOUT_MS), and a structured stderr audit log recording each script's SHA-256, length, and URL.
Pricing: 26 metered tools, no new tools, no free lunch
There are no new tools in v4.8.0 — the new formats and operations were added to existing tools, so the count stays at 26. All 26 are metered and require an API key, with costs ranging from 1 to 10 credits per call. Note that list_ollama_models is now 1 credit — it is no longer free, and no tool is free per call.
| Plan | Price | Credits |
|---|---|---|
| Free | one-time (no card) | 1,000 trial credits (do not reset) |
| Hobby | $19/mo | 5,000 |
| Professional | $99/mo | 50,000 |
| Business | $399/mo | 250,000 |
Every plan includes every tool. LLM extraction defaults to local Ollama, so you do not need an OpenAI or Anthropic key unless you opt in.
How to upgrade
New users:
npm install -g crawlforge-mcp-server
npx crawlforge initExisting users: npm install -g crawlforge-mcp-server@latest, or just trigger an /mcp reconnect. Re-run init (or install-skills) to pick up the 7 skills and self-heal any legacy bare files. v4.8.0 is additive, so nothing breaks.
This continues the 4.7.x correctness cadence: 4.7.2 ran a full live audit of all 26 tools and fixed scrape_with_actions, extract_structured, and resources/read; 4.7.1 fixed deep_research credibilityThreshold and a generate_llms_txt "undefined" bug; 4.7.0 moved to the fully metered model. If you are newer here, the v4.2.2 launch post covers the CLI that now powers monitor:run-due.
What is next
More of the same: a steady cadence of trust-and-correctness hardening across all 26 tools. If you find a control that does not behave the way the docs claim, that is exactly the bug we want to hear about — try it, break it, and tell us what does not work.
Ready to try it? Start free with 1,000 credits — then run npx crawlforge init to install the 7 skills and register the MCP server. See the full docs, the track_changes reference, or our roundup of the best MCP servers for web scraping in 2026.
Try this yourself — no signup needed
Run any of CrawlForge's 27 scraping and extraction tools in the playground, then start free with 1,000 credits.
1,000 free credits • Refills monthly • No credit card required
Tags
About the Author
Stay updated with the latest insights
Get tutorials, product updates, and web scraping tips delivered to your inbox.
No spam. Unsubscribe anytime.