What sites does scrape_template support?

Ten sites in v4.2.2: Amazon, LinkedIn, GitHub, YouTube, Reddit, Hacker News, Stack Overflow, npm, Product Hunt, and Twitter/X. Each has a pre-built schema returning the fields you would normally want (product price/rating, profile name/role, repo stars/README, video transcript, etc.). More templates are coming in v4.3.

Is scraping LinkedIn legal?

The hiQ Labs v. LinkedIn case (9th Circuit, 2022) established that scraping public profile data is generally permissible, but LinkedIn's ToS restricts automated access -- and aggressive scraping or commercial resale can still trigger legal action and bans. Use scrape_template with the linkedin-profile template for public, low-frequency, non-resold use cases. Consult a lawyer if you are scraping at scale or for commercial products.

Can I add a custom template?

Not directly today, but we accept template requests on Discord and prioritize by demand. Sites with significant request volume (Etsy, eBay, TikTok, Instagram, Google Maps) are on the roadmap for v4.3. For one-off custom work, use scrape_structured (CSS selectors) or extract_with_llm (schema-driven).

What is the difference between scrape_template and scrape_structured?

scrape_template is for ten specific sites where we already maintain the schema -- you just pick the template name. scrape_structured is general-purpose: you provide CSS selectors for any site, and CrawlForge runs them. Template is faster and cheaper (1 credit vs 2) when your target is one of the ten supported sites.

How fresh are the scrape_template schemas?

We monitor each supported site for layout changes and typically ship a template patch within 24 hours of any breaking change. Updates are transparent to your code -- you keep calling the same template name and the data shape stays the same. If you notice a regression, report it on Discord or GitHub.

What happens if a supported site changes its layout?

Calls keep returning JSON in the documented shape, even if the underlying selectors needed to change. We absorb the maintenance burden so you do not have to. If a layout change is severe enough to temporarily break a field, we mark that field nullable in the response until the patch is live (usually within 24 hours).

Scrape Amazon, LinkedIn & 8 More Sites With One Tool

Half the scraping requests we see at CrawlForge are the same ten sites: Amazon, LinkedIn, GitHub, YouTube, Reddit, Hacker News, Stack Overflow, npm, Product Hunt, and Twitter/X. We got tired of watching people write the same CSS selectors over and over -- and watching those selectors break the next time the site updated its layout. So we did the work once, packaged it as scrape_template, and now you pay 1 credit and get structured JSON.

What Is scrape_template?
The 10 Supported Sites
Quick Start: Scrape an Amazon Product
LinkedIn Profiles (With Legal Notes)
GitHub Repos for AI Training Data
The Other Seven Templates
scrape_template vs scrape_structured vs extract_with_llm
Limitations

What Is scrape_template?

scrape_template is a single CrawlForge tool with ten pre-built site schemas. You pick the template, pass a URL, and get back structured JSON matching that site's natural shape. No CSS selectors. No HTML parsing. No schema definition.

The trade-off: you only get the ten sites we maintain. If you need something else, use scrape_structured (CSS-first) or extract_with_llm (LLM-first). For the long tail of "I want product data from Amazon" requests, scrape_template is the shortest path. Need a multi-step workflow instead of a single site? See how to use the templates gallery.

It costs 1 credit per scrape -- the same as a basic fetch_url -- because we have already done the schema work upstream.

The 10 Supported Sites

Template	Returns	Best for	Example URL pattern
`amazon-product`	Title, price, rating, review count, images, ASIN, availability	Price monitoring, product research	`/dp/<ASIN>`
`linkedin-profile`	Name, headline, location, about, current company	Lead enrichment	`/in/<handle>`
`github-repo`	Stars, forks, language, topics, license, last updated	Repo analysis, AI training data	`/<owner>/<repo>`
`youtube-video`	Title, channel, views, duration, published, description	Content research	`/watch?v=<id>`
`reddit-thread`	Post title, score, author, subreddit, body	Community signals	`/r/<sub>/comments/<id>`
`hacker-news-front-page`	Front-page stories: title, URL, score, author, comments	Tech trend tracking	`news.ycombinator.com`
`stackoverflow-question`	Question, accepted answer, vote counts, tags	Developer Q&A mining	`/questions/<id>`
`npm-package`	Package metadata, weekly downloads, version, maintainers	Dependency analysis	`/package/<name>`
`producthunt-launch`	Product, tagline, upvotes, topics, website	Launch monitoring	`/posts/<slug>`
`tweet`	Text, author, URL, image	Social listening	`/<user>/status/<id>`

Quick Start: Scrape an Amazon Product

Bash

crawlforge template amazon-product "https://www.amazon.com/dp/B0CHX1W1XY"

Output:

Json

{
  "asin": "B0CHX1W1XY",
  "title": "Logitech MX Master 3S Wireless Performance Mouse",
  "price": { "amount": 99.99, "currency": "USD" },
  "rating": 4.7,
  "review_count": 12483,
  "in_stock": true,
  "images": ["https://m.media-amazon.com/...", "..."],
  "credits_used": 1
}

From an MCP client like Claude Code:

"Use scrape_template with the amazon template to get the current price and rating for ASIN B0CHX1W1XY."

Claude picks the tool, formats the call, and returns the data. One credit.

LinkedIn Profiles (With Legal Notes)

Bash

crawlforge template linkedin-profile "https://www.linkedin.com/in/satyanadella"

Output:

Json

{
  "name": "Satya Nadella",
  "headline": "Chairman and CEO at Microsoft",
  "location": "Redmond, Washington",
  "current_role": { "title": "CEO", "company": "Microsoft", "since": "2014-02" },
  "experience_count": 6,
  "skills_top": ["Leadership", "Strategy", "Cloud Computing"],
  "credits_used": 1
}

A note on LinkedIn scraping. LinkedIn's terms of service restrict automated access. The hiQ Labs v. LinkedIn case (9th Circuit, 2022) established that scraping public profile data is generally permissible, but commercial use, login-required scraping, and aggressive frequency can still trigger legal action and ToS bans. Use scrape_template with the linkedin-profile template for public, low-frequency, non-resold data only.

GitHub Repos for AI Training Data

Bash

crawlforge template github-repo "https://github.com/anthropics/anthropic-sdk-python"

Output:

Json

{
  "owner": "anthropics",
  "name": "anthropic-sdk-python",
  "stars": 1842,
  "forks": 287,
  "primary_language": "Python",
  "languages": { "Python": 98.4, "Makefile": 1.6 },
  "license": "MIT",
  "topics": ["claude", "anthropic", "sdk"],
  "readme_markdown": "# Anthropic Python SDK...",
  "last_commit_at": "2026-05-19T14:22:11Z",
  "credits_used": 1
}

This template is heavily used for AI training-data pipelines -- pulling READMEs at scale across thousands of repos. Pair it with batch_scrape to process a CSV of repo URLs.

The Other Seven Templates

YouTube -- title, channel, views, full transcript when available:

Bash

crawlforge template youtube-video "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Reddit -- post + comment tree:

Bash

crawlforge template reddit-thread "https://www.reddit.com/r/programming/comments/<id>"

Hacker News -- the front page as a list of stories:

Bash

crawlforge template hacker-news-front-page "https://news.ycombinator.com"
# returns up to 30 front-page stories; slice the top 10 with jq:
crawlforge template hacker-news-front-page "https://news.ycombinator.com" --json | jq '.stories[:10]'

Stack Overflow -- question, accepted answer, top alternatives:

Bash

crawlforge template stackoverflow-question "https://stackoverflow.com/questions/12345678"

npm -- package metadata + weekly downloads:

Bash

crawlforge template npm-package "https://www.npmjs.com/package/next"

Product Hunt -- product, makers, upvotes:

Bash

crawlforge template producthunt-launch "https://www.producthunt.com/posts/crawlforge"

Twitter/X -- single tweet with engagement and replies:

Bash

crawlforge template tweet "https://x.com/elonmusk/status/<id>"

All return JSON. All cost 1 credit. All maintained centrally -- when LinkedIn or Amazon updates their layout, we update the template.

scrape_template vs scrape_structured vs extract_with_llm

A decision tree:

Is your target one of the 10 supported sites?
  Yes -> use scrape_template (1 credit, maintained for you)
  No
    Do you know the CSS selectors and are they stable?
      Yes -> use scrape_structured (2 credits, you maintain selectors)
      No  -> use extract_with_llm (3 credits, schema-based, layout-resilient)

Quick comparison:

	scrape_template	scrape_structured	extract_with_llm
Credits	1	2	3
Coverage	10 specific sites	Any site you can write selectors for	Any site
Maintenance	We maintain	You maintain	LLM adapts
Speed	Fast (cached schemas)	Fast	Slower (LLM call)
Best for	Popular sites, high volume	Specific known structure	Unknown or shifting structure

Limitations

Only 10 sites. If you need Etsy, eBay, TikTok, or others, you are waiting on the roadmap or rolling your own with scrape_structured / extract_with_llm. Request templates on Discord.
Public data only. No template requires login. Profiles set to private, gated repos, and protected tweets will return what is publicly visible only.
Layout changes happen. When a site ships a redesign, we usually have the template patched within 24 hours.
Rate limits apply. Heavy-volume LinkedIn or Amazon scraping should pair scrape_template with stealth_mode (5 credits) and respect each site's robots.txt.

Ready to skip the selectors? Start free with 1,000 credits -- enough for 1,000 template scrapes. New here? Read the v4.2.2 launch post for context, or the e-commerce extraction guide for a real-world workflow built around these templates.

What Is scrape_template?
The 10 Supported Sites
Quick Start: Scrape an Amazon Product
LinkedIn Profiles (With Legal Notes)
GitHub Repos for AI Training Data
The Other Seven Templates
scrape_template vs scrape_structured vs extract_with_llm
Limitations

What Is scrape_template?

It costs 1 credit per scrape -- the same as a basic fetch_url -- because we have already done the schema work upstream.

The 10 Supported Sites

Template	Returns	Best for	Example URL pattern
`amazon-product`	Title, price, rating, review count, images, ASIN, availability	Price monitoring, product research	`/dp/<ASIN>`
`linkedin-profile`	Name, headline, location, about, current company	Lead enrichment	`/in/<handle>`
`github-repo`	Stars, forks, language, topics, license, last updated	Repo analysis, AI training data	`/<owner>/<repo>`
`youtube-video`	Title, channel, views, duration, published, description	Content research	`/watch?v=<id>`
`reddit-thread`	Post title, score, author, subreddit, body	Community signals	`/r/<sub>/comments/<id>`
`hacker-news-front-page`	Front-page stories: title, URL, score, author, comments	Tech trend tracking	`news.ycombinator.com`
`stackoverflow-question`	Question, accepted answer, vote counts, tags	Developer Q&A mining	`/questions/<id>`
`npm-package`	Package metadata, weekly downloads, version, maintainers	Dependency analysis	`/package/<name>`
`producthunt-launch`	Product, tagline, upvotes, topics, website	Launch monitoring	`/posts/<slug>`
`tweet`	Text, author, URL, image	Social listening	`/<user>/status/<id>`

Quick Start: Scrape an Amazon Product

Bash

crawlforge template amazon-product "https://www.amazon.com/dp/B0CHX1W1XY"

Output:

Json

{
  "asin": "B0CHX1W1XY",
  "title": "Logitech MX Master 3S Wireless Performance Mouse",
  "price": { "amount": 99.99, "currency": "USD" },
  "rating": 4.7,
  "review_count": 12483,
  "in_stock": true,
  "images": ["https://m.media-amazon.com/...", "..."],
  "credits_used": 1
}

From an MCP client like Claude Code:

"Use scrape_template with the amazon template to get the current price and rating for ASIN B0CHX1W1XY."

Claude picks the tool, formats the call, and returns the data. One credit.

LinkedIn Profiles (With Legal Notes)

Bash

crawlforge template linkedin-profile "https://www.linkedin.com/in/satyanadella"

Output:

Json

{
  "name": "Satya Nadella",
  "headline": "Chairman and CEO at Microsoft",
  "location": "Redmond, Washington",
  "current_role": { "title": "CEO", "company": "Microsoft", "since": "2014-02" },
  "experience_count": 6,
  "skills_top": ["Leadership", "Strategy", "Cloud Computing"],
  "credits_used": 1
}

A note on LinkedIn scraping. LinkedIn's terms of service restrict automated access. The hiQ Labs v. LinkedIn case (9th Circuit, 2022) established that scraping public profile data is generally permissible, but commercial use, login-required scraping, and aggressive frequency can still trigger legal action and ToS bans. Use scrape_template with the linkedin-profile template for public, low-frequency, non-resold data only.

GitHub Repos for AI Training Data

Bash

crawlforge template github-repo "https://github.com/anthropics/anthropic-sdk-python"

Output:

Json

{
  "owner": "anthropics",
  "name": "anthropic-sdk-python",
  "stars": 1842,
  "forks": 287,
  "primary_language": "Python",
  "languages": { "Python": 98.4, "Makefile": 1.6 },
  "license": "MIT",
  "topics": ["claude", "anthropic", "sdk"],
  "readme_markdown": "# Anthropic Python SDK...",
  "last_commit_at": "2026-05-19T14:22:11Z",
  "credits_used": 1
}

This template is heavily used for AI training-data pipelines -- pulling READMEs at scale across thousands of repos. Pair it with batch_scrape to process a CSV of repo URLs.

The Other Seven Templates

YouTube -- title, channel, views, full transcript when available:

Bash

crawlforge template youtube-video "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Reddit -- post + comment tree:

Bash

crawlforge template reddit-thread "https://www.reddit.com/r/programming/comments/<id>"

Hacker News -- the front page as a list of stories:

Bash

crawlforge template hacker-news-front-page "https://news.ycombinator.com"
# returns up to 30 front-page stories; slice the top 10 with jq:
crawlforge template hacker-news-front-page "https://news.ycombinator.com" --json | jq '.stories[:10]'

Stack Overflow -- question, accepted answer, top alternatives:

Bash

crawlforge template stackoverflow-question "https://stackoverflow.com/questions/12345678"

npm -- package metadata + weekly downloads:

Bash

crawlforge template npm-package "https://www.npmjs.com/package/next"

Product Hunt -- product, makers, upvotes:

Bash

crawlforge template producthunt-launch "https://www.producthunt.com/posts/crawlforge"

Twitter/X -- single tweet with engagement and replies:

Bash

crawlforge template tweet "https://x.com/elonmusk/status/<id>"

All return JSON. All cost 1 credit. All maintained centrally -- when LinkedIn or Amazon updates their layout, we update the template.

scrape_template vs scrape_structured vs extract_with_llm

A decision tree:

Is your target one of the 10 supported sites?
  Yes -> use scrape_template (1 credit, maintained for you)
  No
    Do you know the CSS selectors and are they stable?
      Yes -> use scrape_structured (2 credits, you maintain selectors)
      No  -> use extract_with_llm (3 credits, schema-based, layout-resilient)

Quick comparison:

	scrape_template	scrape_structured	extract_with_llm
Credits	1	2	3
Coverage	10 specific sites	Any site you can write selectors for	Any site
Maintenance	We maintain	You maintain	LLM adapts
Speed	Fast (cached schemas)	Fast	Slower (LLM call)
Best for	Popular sites, high volume	Specific known structure	Unknown or shifting structure

Limitations

Only 10 sites. If you need Etsy, eBay, TikTok, or others, you are waiting on the roadmap or rolling your own with scrape_structured / extract_with_llm. Request templates on Discord.
Public data only. No template requires login. Profiles set to private, gated repos, and protected tweets will return what is publicly visible only.
Layout changes happen. When a site ships a redesign, we usually have the template patched within 24 hours.
Rate limits apply. Heavy-volume LinkedIn or Amazon scraping should pair scrape_template with stealth_mode (5 credits) and respect each site's robots.txt.

On this page

Table of Contents

What Is scrape_template?

The 10 Supported Sites

Quick Start: Scrape an Amazon Product

LinkedIn Profiles (With Legal Notes)

GitHub Repos for AI Training Data

The Other Seven Templates

scrape_template vs scrape_structured vs extract_with_llm

Limitations

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Frequently Asked Questions

Related Articles

Web Scraping by Industry: 2026 Playbook

E-commerce Product Data Extraction at Scale

Build a Research Agent with CrawlForge Deep Research

On this page

Table of Contents

What Is scrape_template?

The 10 Supported Sites

Quick Start: Scrape an Amazon Product

LinkedIn Profiles (With Legal Notes)

GitHub Repos for AI Training Data

The Other Seven Templates

scrape_template vs scrape_structured vs extract_with_llm

Limitations

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Frequently Asked Questions

Related Articles

Web Scraping by Industry: 2026 Playbook

E-commerce Product Data Extraction at Scale

Build a Research Agent with CrawlForge Deep Research