extract_content
Extract main article content with readability detection, removing boilerplate elements like ads, navbars, and footers. Perfect for clean content extraction optimized for LLMs and content analysis.
Use Cases
Clean Content for LLMs
Extract article content without ads and navigation for feeding into AI models
Article Extraction
Get the main article text from news sites, blogs, and content platforms
Removing Boilerplate
Strip away ads, popups, headers, footers, and other non-content elements
Content Aggregation
Build RSS readers, news aggregators, and content curation platforms
Reader Mode
Create distraction-free reading experiences like browser reader modes
Research & Analysis
Extract article text for sentiment analysis, NLP, and research projects
Endpoint
/api/v1/tools/extract_content
Parameters
Name | Type | Required | Default | Description |
---|---|---|---|---|
url | string | Required | - | The URL of the webpage to extract content from Example: https://example.com/article |
options | object | Optional | - | Content extraction options Example: {"includeImages": true, "includeLinks": true} |
options.includeImages | boolean | Optional | true | Include images in the extracted content Example: true |
options.includeLinks | boolean | Optional | false | Preserve links in the extracted content Example: false |
options.minTextLength | number | Optional | 100 | Minimum text length (in characters) to consider as main content Example: 200 |
Request Examples
Response Example
{ "success": true, "data": { "title": "The Future of Web Scraping: AI and Machine Learning", "content": "# The Future of Web Scraping\n\nWeb scraping has evolved significantly over the past decade...\n\n## Machine Learning Integration\n\nModern scraping tools now leverage AI to adapt to website changes...", "author": "John Doe", "publishDate": "2024-01-15T10:30:00Z", "images": [ { "src": "https://example.com/images/hero.jpg", "alt": "Web scraping visualization", "width": 1200, "height": 630 } ], "readingTime": 8, "wordCount": 1847, "excerpt": "Web scraping has evolved significantly over the past decade with the integration of AI and machine learning..." }, "credits_used": 2, "credits_remaining": 998, "processing_time": 680}
data.title
Extracted article titledata.content
Main article content in Markdown format (clean, no ads/navbars)data.author
Article author (if available)data.publishDate
Article publication date (ISO 8601 format)data.images
Array of images with src, alt text, and dimensionsdata.readingTime
Estimated reading time in minutes (based on 200 wpm)data.wordCount
Total word count of the extracted contentcredits_used
Credits deducted for this request (2 per extraction)Error Handling
No Content Found (422 Unprocessable Entity)
Unable to extract main content from the page. The page may be empty or have no readable content.
Invalid URL (400 Bad Request)
The URL format is invalid. Ensure it includes the protocol (http:// or https://)
Page Not Accessible (404 Not Found)
The URL returned a 404 error. Verify the URL is correct and publicly accessible.
Insufficient Credits (402 Payment Required)
Your account doesn't have enough credits (need 2). Purchase more credits or upgrade your plan.
Rate Limit Exceeded (429 Too Many Requests)
You've exceeded your plan's rate limit. Wait a moment or upgrade your plan for higher limits.
Credit Cost
Free Plan: 1,000 credits/month = 500 extractions
Hobby Plan: 5,000 credits/month = 2,500 extractions ($19/mo)
Professional Plan: 50,000 credits/month = 25,000 extractions ($99/mo)
Business Plan: 250,000 credits/month = 125,000 extractions ($399/mo)