extract_content

Extract main article content with readability detection, removing boilerplate elements like ads, navbars, and footers. Perfect for clean content extraction optimized for LLMs and content analysis.

Use Cases

Clean Content for LLMs

Extract article content without ads and navigation for feeding into AI models

Article Extraction

Get the main article text from news sites, blogs, and content platforms

Removing Boilerplate

Strip away ads, popups, headers, footers, and other non-content elements

Content Aggregation

Build RSS readers, news aggregators, and content curation platforms

Reader Mode

Create distraction-free reading experiences like browser reader modes

Research & Analysis

Extract article text for sentiment analysis, NLP, and research projects

Endpoint

POST/api/v1/tools/extract_content

Auth Required

2 req/s on Free plan

2 credits

Parameters

Name	Type	Required	Default	Description
url	string	Required	-	The URL of the webpage to extract content from Example: https://example.com/article
options	object	Optional	-	Content extraction options Example: {"includeImages": true, "includeLinks": true}
options.includeImages	boolean	Optional	true	Include images in the extracted content Example: true
options.includeLinks	boolean	Optional	false	Preserve links in the extracted content Example: false
options.minTextLength	number	Optional	100	Minimum text length (in characters) to consider as main content Example: 200

Request Examples

terminalBash

Response Example

200 OK680ms

{
  "success": true,
  "data": {
    "title": "The Future of Web Scraping: AI and Machine Learning",
    "content": "# The Future of Web Scraping\n\nWeb scraping has evolved significantly over the past decade...\n\n## Machine Learning Integration\n\nModern scraping tools now leverage AI to adapt to website changes...",
    "author": "John Doe",
    "publishDate": "2024-01-15T10:30:00Z",
    "images": [
      {
        "src": "https://example.com/images/hero.jpg",
        "alt": "Web scraping visualization",
        "width": 1200,
        "height": 630
      }
    ],
    "readingTime": 8,
    "wordCount": 1847,
    "excerpt": "Web scraping has evolved significantly over the past decade with the integration of AI and machine learning..."
  },
  "credits_used": 2,
  "credits_remaining": 998,
  "processing_time": 680
}

Field Descriptions

data.titleExtracted article title

data.contentMain article content in Markdown format (clean, no ads/navbars)

data.authorArticle author (if available)

data.publishDateArticle publication date (ISO 8601 format)

data.imagesArray of images with src, alt text, and dimensions

data.readingTimeEstimated reading time in minutes (based on 200 wpm)

data.wordCountTotal word count of the extracted content

credits_usedCredits deducted for this request (2 per extraction)

Error Handling

No Content Found (422 Unprocessable Entity)

Unable to extract main content from the page. The page may be empty or have no readable content.

Invalid URL (400 Bad Request)

The URL format is invalid. Ensure it includes the protocol (http:// or https://)

Page Not Accessible (404 Not Found)

The URL returned a 404 error. Verify the URL is correct and publicly accessible.

Insufficient Credits (402 Payment Required)

Your account doesn't have enough credits (need 2). Purchase more credits or upgrade your plan.

Rate Limit Exceeded (429 Too Many Requests)

You've exceeded your plan's rate limit. Wait a moment or upgrade your plan for higher limits.

Pro Tip: extract_content uses Mozilla's Readability algorithm, the same technology behind Firefox's Reader View. It works best on article-style pages with clear content structure.

Credit Cost

2 credits

2 credits per request

Each successful extract_content request costs 2 credits, regardless of content length.

Free Plan: 1,000 credits/month = 500 extractions

Hobby Plan: 5,000 credits/month = 2,500 extractions ($19/mo)

Professional Plan: 50,000 credits/month = 25,000 extractions ($99/mo)

Business Plan: 250,000 credits/month = 125,000 extractions ($399/mo)

Related Tools

extract_text

Extract all text from HTML (includes boilerplate) (1 credit)

summarize_content

Summarize the extracted content (4 credits)

Ready to try extract_content? Sign up for free and get 1,000 credits to start building.

Use Cases

Clean Content for LLMs

Extract article content without ads and navigation for feeding into AI models

Article Extraction

Get the main article text from news sites, blogs, and content platforms

Removing Boilerplate

Strip away ads, popups, headers, footers, and other non-content elements

Content Aggregation

Build RSS readers, news aggregators, and content curation platforms

Reader Mode

Create distraction-free reading experiences like browser reader modes

Research & Analysis

Extract article text for sentiment analysis, NLP, and research projects

Parameters

Name	Type	Required	Default	Description
url	string	Required	-	The URL of the webpage to extract content from Example: https://example.com/article
options	object	Optional	-	Content extraction options Example: {"includeImages": true, "includeLinks": true}
options.includeImages	boolean	Optional	true	Include images in the extracted content Example: true
options.includeLinks	boolean	Optional	false	Preserve links in the extracted content Example: false
options.minTextLength	number	Optional	100	Minimum text length (in characters) to consider as main content Example: 200

Response Example

200 OK680ms

{
  "success": true,
  "data": {
    "title": "The Future of Web Scraping: AI and Machine Learning",
    "content": "# The Future of Web Scraping\n\nWeb scraping has evolved significantly over the past decade...\n\n## Machine Learning Integration\n\nModern scraping tools now leverage AI to adapt to website changes...",
    "author": "John Doe",
    "publishDate": "2024-01-15T10:30:00Z",
    "images": [
      {
        "src": "https://example.com/images/hero.jpg",
        "alt": "Web scraping visualization",
        "width": 1200,
        "height": 630
      }
    ],
    "readingTime": 8,
    "wordCount": 1847,
    "excerpt": "Web scraping has evolved significantly over the past decade with the integration of AI and machine learning..."
  },
  "credits_used": 2,
  "credits_remaining": 998,
  "processing_time": 680
}

Field Descriptions

data.titleExtracted article title

data.contentMain article content in Markdown format (clean, no ads/navbars)

data.authorArticle author (if available)

data.publishDateArticle publication date (ISO 8601 format)

data.imagesArray of images with src, alt text, and dimensions

data.readingTimeEstimated reading time in minutes (based on 200 wpm)

data.wordCountTotal word count of the extracted content

credits_usedCredits deducted for this request (2 per extraction)

Error Handling

No Content Found (422 Unprocessable Entity)

Unable to extract main content from the page. The page may be empty or have no readable content.

Invalid URL (400 Bad Request)

The URL format is invalid. Ensure it includes the protocol (http:// or https://)

Page Not Accessible (404 Not Found)

The URL returned a 404 error. Verify the URL is correct and publicly accessible.

Insufficient Credits (402 Payment Required)

Your account doesn't have enough credits (need 2). Purchase more credits or upgrade your plan.

Rate Limit Exceeded (429 Too Many Requests)

You've exceeded your plan's rate limit. Wait a moment or upgrade your plan for higher limits.

Pro Tip: extract_content uses Mozilla's Readability algorithm, the same technology behind Firefox's Reader View. It works best on article-style pages with clear content structure.

Credit Cost

2 credits

2 credits per request

Each successful extract_content request costs 2 credits, regardless of content length.

Free Plan: 1,000 credits/month = 500 extractions

Hobby Plan: 5,000 credits/month = 2,500 extractions ($19/mo)

Professional Plan: 50,000 credits/month = 25,000 extractions ($99/mo)

Business Plan: 250,000 credits/month = 125,000 extractions ($399/mo)