CrawlForge
HomePlaygroundUse CasesIntegrationsPricingDocumentationBlog
Web Scraping From the CLI: The CrawlForge CLI Guide
Tutorials
Back to Blog
Tutorials

Web Scraping From the CLI: The CrawlForge CLI Guide

C
CrawlForge Team
Engineering Team
May 21, 2026
10 min read

On this page

Quick Answer

The CrawlForge CLI is a terminal-first wrapper around all 26 CrawlForge tools. It ships inside the crawlforge-mcp-server package as the `crawlforge` command, works without an MCP client, outputs JSON for shell pipelines, and installs in 30 seconds with `npm install -g crawlforge-mcp-server`. Use it for cron jobs, CI/CD steps, one-off research, and any workflow where you would otherwise reach for curl plus a custom parser.

Most AI tools love to be agents. The CrawlForge CLI is built for the opposite: scriptable, terminal-first, predictable. You install it, set an environment variable, and every one of CrawlForge's 26 tools becomes a shell command. JSON in, JSON out. Pipe to jq, schedule with cron, run in CI -- it works the same way everywhere.

Table of Contents

  • What Is the CrawlForge CLI?
  • Install in 30 Seconds
  • The 15 Commands at a Glance
  • Your First Scrape
  • Piping JSON Output to jq
  • Scheduling With Cron
  • CLI vs MCP vs Raw API
  • Three Real-World Workflows
  • Global Flags Reference
  • What It Costs

What Is the CrawlForge CLI?

The CrawlForge CLI ships inside the crawlforge-mcp-server package as the crawlforge command and exposes all 26 CrawlForge tools as terminal commands. A single global install gives you both the MCP server and the CLI. It does not need a long-running process or an MCP client: you type crawlforge scrape <url>, it makes an HTTPS call to CrawlForge's API, and prints JSON to stdout. That is the entire story.

It exists because half the scraping work people do is not agent-shaped. Cron jobs, CI steps, one-off research, ad-hoc pulls from a shell -- those want plain old commands, not a JSON-RPC handshake.

Install in 30 Seconds

Bash
npm install -g crawlforge-mcp-server
export CRAWLFORGE_API_KEY="cf_live_your_key_here"
crawlforge --help

That is it. No config file, no auth flow, no service to start. If you do not have an API key yet, grab one at crawlforge.dev/signup -- you get 1,000 free credits on signup.

To make the env var permanent on macOS or Linux:

Bash
echo 'export CRAWLFORGE_API_KEY="cf_live_..."' >> ~/.zshrc
source ~/.zshrc

On Windows (PowerShell):

Powershell
[Environment]::SetEnvironmentVariable("CRAWLFORGE_API_KEY", "cf_live_...", "User")

The 15 Commands at a Glance

Every command maps to one or more CrawlForge tools:

CommandPrimary toolCreditsExample
scrapefetch_url, extract_content1-2crawlforge scrape https://example.com
searchsearch_web5crawlforge search "MCP servers 2026"
crawlcrawl_deep4crawlforge crawl https://docs.example.com --depth 3
mapmap_site2crawlforge map https://example.com
extractextract_with_llm3crawlforge extract <url> --schema schema.json
tracktrack_changes3crawlforge track <url> --threshold 10
analyzeanalyze_content3crawlforge analyze <url>
researchdeep_research10crawlforge research "AI agents in 2026"
stealthstealth_mode5crawlforge stealth <url>
batchbatch_scrape5crawlforge batch urls.txt
actionsscrape_with_actions5crawlforge actions <url> --script steps.json
localizelocalization2crawlforge localize <url> --country DE
llmstxtgenerate_llms_txt5crawlforge llmstxt https://example.com
templatescrape_template1crawlforge template amazon-product <url>
monitortrack_changes3crawlforge monitor <url> --interval 3600

Your First Scrape

The simplest possible call:

Bash
crawlforge scrape https://news.ycombinator.com

What comes back is the page's main content as JSON:

Json
{
  "url": "https://news.ycombinator.com",
  "title": "Hacker News",
  "content": "Hacker News new | past | comments | ask...",
  "links": ["https://news.ycombinator.com/from?site=...", "..."],
  "fetched_at": "2026-05-21T10:14:33Z",
  "credits_used": 1
}

Want just the URLs? Pipe to jq:

Bash
crawlforge scrape https://news.ycombinator.com --json | jq '.links[]'

Want it in a file? Redirect stdout:

Bash
crawlforge scrape https://news.ycombinator.com --pretty > hn.json

Piping JSON Output to jq

This is the workflow that makes the CLI worth installing. Everything outputs JSON, and JSON pipes into anything.

Get the HN front-page story titles:

Bash
crawlforge template hacker-news-front-page https://news.ycombinator.com --json \
  | jq -r '.stories[] | .title'

Search the web and extract URLs:

Bash
crawlforge search "best web scraping libraries 2026" --json \
  | jq '.results[] | .url'

Scrape a page and count words:

Bash
crawlforge scrape https://example.com --json \
  | jq -r '.content' \
  | wc -w

Batch scrape, then filter for error responses:

Bash
crawlforge batch urls.txt --json \
  | jq '.results[] | select(.status_code >= 400)'

The pattern: --json gives you machine-readable output, then jq slices and dices.

Scheduling With Cron

A daily check on a competitor's pricing page:

Bash
# crontab -e
0 9 * * * /usr/local/bin/crawlforge track https://competitor.com/pricing --json > /var/log/pricing.json

A nightly research run:

Bash
0 2 * * * /usr/local/bin/crawlforge research "AI tooling news" --depth standard --pretty > /var/log/research.json

A weekly llms.txt regeneration for your own site:

Bash
0 3 * * 0 /usr/local/bin/crawlforge llmstxt https://yoursite.com --include-full > /var/www/yoursite.com/llms.txt

In CI? Use the same commands in your GitHub Actions YAML. The CLI checks CRAWLFORGE_API_KEY first, so just set it as a repository secret.

Yaml
# .github/workflows/daily-research.yml
- name: Run weekly research
  env:
    CRAWLFORGE_API_KEY: ${{ secrets.CRAWLFORGE_API_KEY }}
  run: |
    npm install -g crawlforge-mcp-server
    crawlforge research "industry news" --depth standard --pretty > report.json

CLI vs MCP vs Raw API: When to Use Each

WorkflowUse the CLIUse MCPUse Raw API
One-off scrape from your terminalyesnono
Cron job or CI stepyesnoonly if you need to
Claude / Cursor / Windsurf agentnoyesno
Embedded in a Node/Python servicenoonly if MCP-shapedyes
Long-running background workernonoyes
Quick exploration of an unfamiliar siteyesmaybeno

Rule of thumb: if a human is typing the command, use the CLI. If an LLM is selecting the tool, use MCP. If a server is calling it in a loop, use the raw API.

Three Real-World Workflows

1. Competitive Pricing Monitor

A shell script that runs daily, scrapes three competitor pricing pages, diffs against yesterday's snapshot, and posts to Slack if anything changed.

Bash
#!/bin/bash
for url in $(cat competitors.txt); do
  crawlforge track "$url" --json \
    > "snapshots/$(date +%F)-$(basename $url).json"
done

# Diff against yesterday's snapshot
diff "snapshots/$(date -v-1d +%F)-pricing.json" \
     "snapshots/$(date +%F)-pricing.json" \
  || curl -X POST $SLACK_WEBHOOK -d '{"text": "Pricing changed"}'

Cost: ~9 credits per day (3 competitors × 3 credits for track).

2. Lead Enrichment From a CSV

Read a CSV of company domains, scrape each homepage for contact info, write enriched data back.

Bash
while IFS=, read -r company domain; do
  data=$(crawlforge scrape "https://$domain" --json)
  email=$(echo "$data" | jq -r '.metadata.contact_email // empty')
  echo "$company,$domain,$email" >> enriched.csv
done < companies.csv

Cost: 1 credit per company.

3. Research Report Pipeline

A weekly Sunday cron that runs a research query and emails the synthesized summary to the team.

Bash
crawlforge research "AI agent frameworks news this week" --depth deep --pretty > report.json
jq -r '.summary' report.json \
  | mail -s "Weekly AI report" team@example.com

Cost: 10 credits per run (research includes the synthesized summary).

Global Flags Reference

These work on every command:

  • --json -- compact, machine-readable JSON (pipe-friendly)
  • --pretty -- pretty-printed JSON
  • --quiet -- suppress all stdout output (exit code only)
  • --api-key <key> -- override the CRAWLFORGE_API_KEY env var
  • --timeout <ms> -- override the default 30s timeout

To write results to a file, redirect stdout: crawlforge scrape <url> --pretty > out.json.

What It Costs

The CLI itself is free. You pay only for the underlying tool calls, billed against your existing credit balance. No extra subscription, no per-invocation fee. A daily cron that runs track against three URLs and research once a week costs roughly 100 credits per month -- well within the free tier.


Ready to install? Get your free API key at crawlforge.dev/signup and run npm install -g crawlforge-mcp-server. New here? Read the v4.2.2 launch announcement for everything new, or the original MCP quickstart for the MCP version instead.

Try this yourself — no signup needed

Run any of CrawlForge's 27 scraping and extraction tools in the playground, then start free with 1,000 credits.

1,000 free credits • Refills monthly • No credit card required

Tags

CLIweb-scrapingtutorialterminalautomationscripting

About the Author

C

CrawlForge Team

Engineering Team

Building the most comprehensive web scraping MCP server. We create tools that help developers extract, analyze, and transform web data for AI applications.

Stay updated with the latest insights

Get tutorials, product updates, and web scraping tips delivered to your inbox.

No spam. Unsubscribe anytime.

Put this into practice

Test CrawlForge's tools on any URL — free, no signup.

On this page

Frequently Asked Questions

Is the CrawlForge CLI free?+

The CLI package itself is free and open. You pay only for the underlying tool calls billed against your normal CrawlForge credit balance, the same as you would from MCP or the raw API. There is no extra per-invocation fee.

Do I need a CrawlForge API key to use the CLI?+

Yes. The CLI reads the CRAWLFORGE_API_KEY environment variable on every call. Get a free key at crawlforge.dev/signup (no credit card required) and set it once in your shell profile.

Can I use the CrawlForge CLI in CI/CD pipelines?+

Yes -- this is one of its primary use cases. Install via "npm install -g crawlforge-mcp-server" in your CI runner, set CRAWLFORGE_API_KEY as a repository secret, and run any command. It works the same in GitHub Actions, GitLab CI, CircleCI, and Jenkins.

How is the CrawlForge CLI different from curl?+

curl gives you raw HTML. The CrawlForge CLI returns structured JSON: cleaned content, extracted metadata, links, headings, and tool-specific fields like search results, research summaries, or template-scraped product data. It also handles anti-bot defenses, stealth mode, and browser automation -- all things curl cannot do.

Does the CLI support all 26 CrawlForge tools?+

Yes. The 15 commands cover all 26 tools (some commands expose multiple tools via flags). For example, "crawlforge extract" maps to extract_with_llm by default and extract_structured with the --css flag.

Can the CrawlForge CLI output structured data for parsing?+

Yes -- pass --json on any command and the output is clean JSON suitable for piping into jq or any JSON-aware tool. Use --pretty for human-readable formatting, or redirect stdout to a file (crawlforge scrape <url> --pretty > out.json).

Related Articles

How to Use CrawlForge with Make and Zapier
Tutorials

How to Use CrawlForge with Make and Zapier

Connect CrawlForge to Make (Integromat) and Zapier for automated web scraping. No-code setup with HTTP modules, webhooks, and workflow examples.

C
CrawlForge Team
|
Apr 23
|
8m
How to Scrape Websites with Claude Code (2026 Guide)
Tutorials

How to Scrape Websites with Claude Code (2026 Guide)

Scrape any website from your terminal with Claude Code and CrawlForge MCP. Fetch pages, extract data, bypass anti-bot -- in under 2 minutes.

C
CrawlForge Team
|
Apr 14
|
10m
How to Build a Web-Scraping MCP Server in TypeScript (2026)
Tutorials

How to Build a Web-Scraping MCP Server in TypeScript (2026)

Build a working web-scraping MCP server in TypeScript with the official SDK: a minimal server, a real cheerio scraping tool, testing, and Claude Desktop setup.

C
CrawlForge Team
|
Jun 16
|
12m

Footer

CrawlForge

Enterprise web scraping for AI Agents. 27 specialized MCP tools designed for modern developers building intelligent systems.

Product

  • Features
  • Playground
  • Pricing
  • Use Cases
  • Integrations
  • Alternatives
  • Changelog

Resources

  • Getting Started
  • API Reference
  • Templates
  • Guides
  • Blog
  • Glossary
  • FAQ
  • Sitemap

Developers

  • MCP Protocol
  • Claude Desktop
  • Cursor IDE
  • LangChain
  • LlamaIndex

Company

  • About
  • Contact
  • Privacy
  • Terms
  • Acceptable Use
  • Cookies

Stay updated

Get the latest updates on new tools and features.

Built with Next.js and MCP protocol

© 2025-2026 CrawlForge. All rights reserved.