On this page
A web extraction task should not require sending your scraped data to OpenAI. With extract_with_llm and a local Ollama install, structured extraction now runs entirely on your machine -- no API key, no per-token cost, no third party seeing your content. This post is the deep dive: why local matters, how to set it up, and when to use it versus the alternatives.
Table of Contents
- What Is LLM Web Extraction?
- Why Local Ollama (and Not OpenAI)?
- Setup: Ollama + extract_with_llm in 5 Minutes
- list_ollama_models: Discover What Is On Your Machine
- Schema-Driven Extraction
- When to Use Ollama vs OpenAI vs Anthropic
- Cost Comparison
- Limitations and Accuracy
What Is LLM Web Extraction?
LLM web extraction is the workflow where you hand a model two things -- the raw HTML of a page and a schema describing what you want -- and the model returns structured JSON. It is the modern replacement for hand-written CSS selectors. When a site's layout changes, the selectors break; the LLM adapts.
CrawlForge has had this for a while as extract_structured (CSS-first, LLM fallback). The new extract_with_llm tool is LLM-first: every extraction is an LLM call. What changed in v4.2.2 is the default provider. Previously you needed an OpenAI or Anthropic key. Now you do not.
Why Local Ollama (and Not OpenAI)?
Three reasons, in descending order of importance for most teams.
1. Your scraped data stays on your machine. When you call OpenAI, the page content goes to OpenAI. For competitive intelligence, legal docs, customer data, or anything regulated, that is often a deal-breaker. Local Ollama means localhost only.
2. The LLM is free. A gpt-4o-mini call costs around $0.0001 per extraction at scale. Sounds like nothing -- until you are running 10,000 extractions a day. Ollama is $0 plus electricity. You still pay 3 CrawlForge credits per call regardless of provider.
3. No new API key to manage. If you have Ollama, you have a provider. No signup, no billing alerts, no rotation.
The trade-off is model quality. A llama3.1:8b on a laptop will not match gpt-4o on hard pages. For the 80% case (clean HTML, simple schema), it is fine. For the 20% (heavily formatted, ambiguous), you can fall back to OpenAI or Anthropic in the same call.
Setup: Ollama + extract_with_llm in 5 Minutes
That is the Ollama side. Now from CrawlForge (via the CLI, the MCP server, or the API -- any path works):
From the CLI:
list_ollama_models: Discover What Is On Your Machine
Before running a big batch, sanity-check what is actually installed:
Or via MCP/SDK:
This costs 0 credits. It is a discovery helper, not an extraction call. Useful in onboarding flows, CI smoke tests ("is Ollama set up correctly?"), and as the first step in any local-LLM pipeline.
Schema-Driven Extraction
The schema is a standard JSON Schema object. The LLM reads it and produces matching output. Two patterns work well:
Pattern 1: Flat schema for simple data
Pattern 2: Nested schema for richer data
The smaller the schema, the more reliable the extraction -- especially on local 8B models. If you find a model hallucinating fields, simplify.
When to Use Ollama vs OpenAI vs Anthropic
| Scenario | Recommended provider | Why |
|---|---|---|
| High-volume batch extraction | Ollama | Marginal LLM cost = $0 |
| Sensitive or regulated data | Ollama | Localhost only |
| Complex, multi-page reasoning | OpenAI (gpt-4o) | Best reasoning on hard pages |
| Long context (50K+ tokens) | Anthropic (Claude) | 200K context window |
| Quick prototype, low volume | Ollama | Zero setup beyond install |
| Ambiguous content (sarcasm, slang) | OpenAI or Anthropic | Larger model = better disambiguation |
The good news: switching providers is one parameter change. You can prototype on Ollama, hit a hard page, switch to provider: "openai" for that one call, and continue.
Cost Comparison
Cost-per-1,000-extractions, assuming average page (~3K tokens in, ~200 tokens out):
| Provider | Model | LLM cost per 1K | CrawlForge credits per 1K | Total per 1K |
|---|---|---|---|---|
| Ollama (local) | llama3.1:8b | $0 (electricity) | 3,000 credits | 3,000 credits |
| Ollama (local) | llama3.1:70b | $0 (electricity, more) | 3,000 credits | 3,000 credits |
| OpenAI | gpt-4o-mini | ~$0.45 | 3,000 credits | 3,000 credits + $0.45 |
| OpenAI | gpt-4o | ~$8.50 | 3,000 credits | 3,000 credits + $8.50 |
| Anthropic | claude-haiku-4-5 | ~$0.30 | 3,000 credits | 3,000 credits + $0.30 |
| Anthropic | claude-sonnet-4-6 | ~$4.50 | 3,000 credits | 3,000 credits + $4.50 |
CrawlForge credits are flat across providers -- you pay for the orchestration, not the model. At our Hobby tier (10,000 credits/$19/mo), that is roughly 3,300 LLM extractions per month for $19 + $0 (if you use Ollama).
Limitations and Accuracy
What local Ollama (llama3.1:8b) does well in our internal benchmarks:
- Clean product pages (Amazon, Shopify): ~95% field accuracy
- Article metadata (Substack, Medium): ~92%
- Forum threads (HN, Reddit): ~88%
What it struggles with:
- Heavily JS-rendered pages with minimal HTML (use
scrape_with_actionsfirst, then extract) - Pages with ambiguous structure (use a frontier model)
- Schemas with 20+ fields (split into multiple extractions)
- Non-English content (use a multilingual model like
mistral)
If accuracy is critical and Ollama is missing fields, the fix is usually: simpler schema, or upgrade to provider: "openai" for that call.
Ready to extract data without leaving your machine? Start free with 1,000 credits and try extract_with_llm with your local Ollama. New to CrawlForge? Read the v4.2.2 launch announcement for context, or the CLI guide for a terminal-first workflow.