On this page
Half the scraping requests we see at CrawlForge are the same ten sites: Amazon, LinkedIn, GitHub, YouTube, Reddit, Hacker News, Stack Overflow, npm, Product Hunt, and Twitter/X. We got tired of watching people write the same CSS selectors over and over -- and watching those selectors break the next time the site updated its layout. So we did the work once, packaged it as scrape_template, and now you pay 1 credit and get structured JSON.
Table of Contents
- What Is scrape_template?
- The 10 Supported Sites
- Quick Start: Scrape an Amazon Product
- LinkedIn Profiles (With Legal Notes)
- GitHub Repos for AI Training Data
- The Other Seven Templates
- scrape_template vs scrape_structured vs extract_with_llm
- Limitations
What Is scrape_template?
scrape_template is a single CrawlForge tool with ten pre-built site schemas. You pick the template, pass a URL, and get back structured JSON matching that site's natural shape. No CSS selectors. No HTML parsing. No schema definition.
The trade-off: you only get the ten sites we maintain. If you need something else, use scrape_structured (CSS-first) or extract_with_llm (LLM-first). For the long tail of "I want product data from Amazon" requests, scrape_template is the shortest path. Need a multi-step workflow instead of a single site? See how to use the templates gallery.
It costs 1 credit per scrape -- the same as a basic fetch_url -- because we have already done the schema work upstream.
The 10 Supported Sites
| Template | Returns | Best for | Example URL pattern |
|---|---|---|---|
amazon-product | Title, price, rating, review count, images, ASIN, availability | Price monitoring, product research | /dp/<ASIN> |
linkedin-profile | Name, headline, location, about, current company | Lead enrichment | /in/<handle> |
github-repo | Stars, forks, language, topics, license, last updated | Repo analysis, AI training data | /<owner>/<repo> |
youtube-video | Title, channel, views, duration, published, description | Content research | /watch?v=<id> |
reddit-thread | Post title, score, author, subreddit, body | Community signals | /r/<sub>/comments/<id> |
hacker-news-front-page | Front-page stories: title, URL, score, author, comments | Tech trend tracking | news.ycombinator.com |
stackoverflow-question | Question, accepted answer, vote counts, tags | Developer Q&A mining | /questions/<id> |
npm-package | Package metadata, weekly downloads, version, maintainers | Dependency analysis | /package/<name> |
producthunt-launch | Product, tagline, upvotes, topics, website | Launch monitoring | /posts/<slug> |
tweet | Text, author, URL, image | Social listening | /<user>/status/<id> |
Quick Start: Scrape an Amazon Product
crawlforge template amazon-product "https://www.amazon.com/dp/B0CHX1W1XY"Output:
{
"asin": "B0CHX1W1XY",
"title": "Logitech MX Master 3S Wireless Performance Mouse",
"price": { "amount": 99.99, "currency": "USD" },
"rating": 4.7,
"review_count": 12483,
"in_stock": true,
"images": ["https://m.media-amazon.com/...", "..."],
"credits_used": 1
}From an MCP client like Claude Code:
"Use scrape_template with the amazon template to get the current price and rating for ASIN B0CHX1W1XY."
Claude picks the tool, formats the call, and returns the data. One credit.
LinkedIn Profiles (With Legal Notes)
crawlforge template linkedin-profile "https://www.linkedin.com/in/satyanadella"Output:
{
"name": "Satya Nadella",
"headline": "Chairman and CEO at Microsoft",
"location": "Redmond, Washington",
"current_role": { "title": "CEO", "company": "Microsoft", "since": "2014-02" },
"experience_count": 6,
"skills_top": ["Leadership", "Strategy", "Cloud Computing"],
"credits_used": 1
}A note on LinkedIn scraping. LinkedIn's terms of service restrict automated access. The hiQ Labs v. LinkedIn case (9th Circuit, 2022) established that scraping public profile data is generally permissible, but commercial use, login-required scraping, and aggressive frequency can still trigger legal action and ToS bans. Use
scrape_templatewith thelinkedin-profiletemplate for public, low-frequency, non-resold data only.
GitHub Repos for AI Training Data
crawlforge template github-repo "https://github.com/anthropics/anthropic-sdk-python"Output:
{
"owner": "anthropics",
"name": "anthropic-sdk-python",
"stars": 1842,
"forks": 287,
"primary_language": "Python",
"languages": { "Python": 98.4, "Makefile": 1.6 },
"license": "MIT",
"topics": ["claude", "anthropic", "sdk"],
"readme_markdown": "# Anthropic Python SDK...",
"last_commit_at": "2026-05-19T14:22:11Z",
"credits_used": 1
}This template is heavily used for AI training-data pipelines -- pulling READMEs at scale across thousands of repos. Pair it with batch_scrape to process a CSV of repo URLs.
The Other Seven Templates
YouTube -- title, channel, views, full transcript when available:
crawlforge template youtube-video "https://www.youtube.com/watch?v=dQw4w9WgXcQ"Reddit -- post + comment tree:
crawlforge template reddit-thread "https://www.reddit.com/r/programming/comments/<id>"Hacker News -- the front page as a list of stories:
crawlforge template hacker-news-front-page "https://news.ycombinator.com"
# returns up to 30 front-page stories; slice the top 10 with jq:
crawlforge template hacker-news-front-page "https://news.ycombinator.com" --json | jq '.stories[:10]'Stack Overflow -- question, accepted answer, top alternatives:
crawlforge template stackoverflow-question "https://stackoverflow.com/questions/12345678"npm -- package metadata + weekly downloads:
crawlforge template npm-package "https://www.npmjs.com/package/next"Product Hunt -- product, makers, upvotes:
crawlforge template producthunt-launch "https://www.producthunt.com/posts/crawlforge"Twitter/X -- single tweet with engagement and replies:
crawlforge template tweet "https://x.com/elonmusk/status/<id>"All return JSON. All cost 1 credit. All maintained centrally -- when LinkedIn or Amazon updates their layout, we update the template.
scrape_template vs scrape_structured vs extract_with_llm
A decision tree:
Is your target one of the 10 supported sites?
Yes -> use scrape_template (1 credit, maintained for you)
No
Do you know the CSS selectors and are they stable?
Yes -> use scrape_structured (2 credits, you maintain selectors)
No -> use extract_with_llm (3 credits, schema-based, layout-resilient)
Quick comparison:
| scrape_template | scrape_structured | extract_with_llm | |
|---|---|---|---|
| Credits | 1 | 2 | 3 |
| Coverage | 10 specific sites | Any site you can write selectors for | Any site |
| Maintenance | We maintain | You maintain | LLM adapts |
| Speed | Fast (cached schemas) | Fast | Slower (LLM call) |
| Best for | Popular sites, high volume | Specific known structure | Unknown or shifting structure |
Limitations
- Only 10 sites. If you need Etsy, eBay, TikTok, or others, you are waiting on the roadmap or rolling your own with
scrape_structured/extract_with_llm. Request templates on Discord. - Public data only. No template requires login. Profiles set to private, gated repos, and protected tweets will return what is publicly visible only.
- Layout changes happen. When a site ships a redesign, we usually have the template patched within 24 hours.
- Rate limits apply. Heavy-volume LinkedIn or Amazon scraping should pair
scrape_templatewithstealth_mode(5 credits) and respect each site's robots.txt.
Ready to skip the selectors? Start free with 1,000 credits -- enough for 1,000 template scrapes. New here? Read the v4.2.2 launch post for context, or the e-commerce extraction guide for a real-world workflow built around these templates.
Try this yourself — no signup needed
Run any of CrawlForge's 27 scraping and extraction tools in the playground, then start free with 1,000 credits.
1,000 free credits • Refills monthly • No credit card required
Tags
About the Author
Stay updated with the latest insights
Get tutorials, product updates, and web scraping tips delivered to your inbox.
No spam. Unsubscribe anytime.