Basic Tool1 credit

extract_text

Extract clean, readable text from HTML with intelligent parsing. Automatically removes scripts, styles, and boilerplate content while preserving the main text content.

Use Cases

Article Extraction for LLMs

Extract clean article text for summarization, analysis, or AI processing

Content Analysis

Get plain text for word count, readability analysis, or sentiment detection

Clean Text for Summarization

Remove HTML noise before passing to summarization models

Boilerplate Removal

Remove ads, navigation, and other non-content elements

Endpoint

POST/api/v1/tools/extract_text
Auth Required
2 req/s on Free plan
1 credit

Parameters

NameTypeRequiredDefaultDescription
html
stringOptional-
HTML content to extract text from (provide either html or url)
Example: <html><body><h1>Hello World</h1></body></html>
url
stringOptional-
URL to fetch and extract text from (provide either html or url)
Example: https://example.com/article
selector
stringOptional-
CSS selector to target specific elements (default: entire page)
Example: article, .content, #main
clean
booleanOptionaltrue
Remove extra whitespace and normalize formatting
Example: true
preserve_links
booleanOptionalfalse
Include links in the extracted text with their URLs
Example: false
preserve_formatting
booleanOptionalfalse
Preserve basic HTML formatting (paragraphs, line breaks)
Example: false
max_length
numberOptional-
Maximum length of extracted text (will truncate with ...)
Example: 5000

Request Examples

cURL - Extract from URL

terminalBash

TypeScript - Extract from HTML

extractText.tsTypescript

Python - Extract with Selector

extract_text.pyPython

Response Example

200 OK180ms
{
"success": true,
"data": {
"text": "Article Title\n\nThis is the main content of the article. It contains useful information that has been extracted from the HTML.\n\nLinks:\nRelated Article (/related)",
"metadata": {
"title": "Article Title - Example Site",
"description": "Meta description of the article",
"word_count": 248,
"character_count": 1432,
"selector_used": "article",
"links_preserved": true,
"formatting_preserved": false
}
},
"credits_used": 1,
"credits_remaining": 999,
"processing_time": 180
}
Field Descriptions
data.textThe extracted plain text content
data.metadata.word_countTotal number of words in the extracted text
data.metadata.character_countTotal number of characters
data.metadata.selector_usedThe CSS selector that was applied
credits_usedCredits deducted for this request (1 per extraction)

Error Handling

Missing Input (400 Bad Request)

Neither html nor url was provided. You must provide at least one.

Invalid Selector (400 Bad Request)

The CSS selector is invalid or matches no elements. Verify your selector syntax.

URL Fetch Failed (500 Internal Server Error)

Failed to fetch the URL. Check that the URL is accessible and returns HTML.

Credit Cost

1 credit
1 credit per request
Each text extraction costs 1 credit, regardless of content size or selector complexity.

Related Tools