Advanced Tool2 creditsPer Page

process_document

Process PDF, DOCX, and TXT documents with text extraction, image extraction, and optional OCR support. Perfect for parsing academic papers, invoices, forms, and multi-format document processing.

Use Cases

Document Parsing

Extract text and metadata from PDFs, Word documents, and text files

Academic Research

Process research papers, theses, and academic publications for analysis

Invoice Processing

Extract structured data from invoices, receipts, and financial documents

Form Extraction

Parse application forms, surveys, and questionnaires

Legal Documents

Extract text from contracts, agreements, and legal filings

Scanned Document OCR

Convert scanned images and PDFs to searchable text with OCR

Endpoint

POST/api/v1/tools/process_document

Auth Required

2 req/s on Free plan

2 credits

Credit Cost: 2 credits per page + 2 additional credits per page if OCR is enabled. A 10-page PDF costs 20 credits (or 40 credits with OCR).

Parameters

Name	Type	Required	Default	Description
source	string	Required	-	The document source (URL or file path depending on sourceType) Example: https://example.com/document.pdf
sourceType	string	Required	-	Type of source: "url", "pdf_url", "file", or "pdf_file" Example: pdf_url
options	object	Optional	-	Processing options Example: {"extractImages": true, "ocrEnabled": false}
options.extractImages	boolean	Optional	false	Whether to extract images from the document Example: true
options.ocrEnabled	boolean	Optional	false	Enable OCR for scanned documents (adds 2 credits per page) Example: false
options.maxPages	number	Optional	-	Maximum number of pages to process (default: all pages) Example: 10

Request Examples

terminalBash

Response Example

200 OK3450ms

{
  "success": true,
  "data": {
    "pages": [
      {
        "pageNumber": 1,
        "text": "Introduction\n\nThis research paper explores the applications of machine learning...",
        "wordCount": 523,
        "images": [
          "image_1_base64..."
        ]
      },
      {
        "pageNumber": 2,
        "text": "Methodology\n\nOur approach involves collecting data from multiple sources...",
        "wordCount": 612,
        "images": []
      }
    ],
    "metadata": {
      "title": "Machine Learning Applications in Healthcare",
      "author": "Dr. Jane Smith",
      "creationDate": "2024-01-15",
      "pageCount": 10,
      "fileSize": 2456789,
      "format": "PDF"
    },
    "extractedText": "Introduction\n\nThis research paper explores the applications of machine learning...\n\nMethodology\n\nOur approach involves...",
    "images": [
      "image_1_base64..."
    ],
    "totalPages": 10,
    "processedPages": 10
  },
  "credits_used": 20,
  "credits_remaining": 980,
  "processing_time": 3450
}

Field Descriptions

data.pagesArray of page objects with text and images per page

data.metadataDocument metadata (title, author, dates, format)

data.extractedTextCombined text from all pages

data.imagesArray of extracted images in base64 format (if extractImages: true)

data.totalPagesTotal number of pages in the document

credits_usedCredits deducted (2 per page × 10 pages = 20 credits)

processing_timeTotal processing time in milliseconds

Error Handling

Unsupported Format (400 Bad Request)

The document format is not supported. Supported formats: PDF, DOCX, TXT.

File Too Large (413 Payload Too Large)

The document exceeds the maximum file size of 50MB. Split large documents into smaller files.

Corrupted Document (422 Unprocessable Entity)

The document is corrupted or password-protected. Ensure the file is valid and not encrypted.

Insufficient Credits (402 Payment Required)

Your account doesn't have enough credits for this document (need {pageCount} × 2 credits). Purchase more credits.

Rate Limit Exceeded (429 Too Many Requests)

You've exceeded your plan's rate limit. Wait a moment or upgrade your plan for higher limits.

Pro Tip: Use the maxPages parameter to limit credit usage when processing large documents. Process in batches if you only need specific sections.

Credit Cost

2 credits

2 credits per page (4 credits with OCR)

Each page processed costs 2 credits. Enable OCR for an additional 2 credits per page.

Example: 10-page PDF = 20 credits (or 40 credits with OCR)

Free Plan: 1,000 credits/month = 500 pages (or 250 pages with OCR)

Hobby Plan: 5,000 credits/month = 2,500 pages ($19/mo)

Professional Plan: 50,000 credits/month = 25,000 pages ($99/mo)

Related Tools

summarize_content

Summarize extracted document text (4 credits)

extract_text

Extract clean text from HTML documents (1 credit)

Ready to try process_document? Sign up for free and get 1,000 credits to start building.

Use Cases

Document Parsing

Extract text and metadata from PDFs, Word documents, and text files

Academic Research

Process research papers, theses, and academic publications for analysis

Invoice Processing

Extract structured data from invoices, receipts, and financial documents

Form Extraction

Parse application forms, surveys, and questionnaires

Legal Documents

Extract text from contracts, agreements, and legal filings

Scanned Document OCR

Convert scanned images and PDFs to searchable text with OCR

Parameters

Name	Type	Required	Default	Description
source	string	Required	-	The document source (URL or file path depending on sourceType) Example: https://example.com/document.pdf
sourceType	string	Required	-	Type of source: "url", "pdf_url", "file", or "pdf_file" Example: pdf_url
options	object	Optional	-	Processing options Example: {"extractImages": true, "ocrEnabled": false}
options.extractImages	boolean	Optional	false	Whether to extract images from the document Example: true
options.ocrEnabled	boolean	Optional	false	Enable OCR for scanned documents (adds 2 credits per page) Example: false
options.maxPages	number	Optional	-	Maximum number of pages to process (default: all pages) Example: 10

Response Example

200 OK3450ms

{
  "success": true,
  "data": {
    "pages": [
      {
        "pageNumber": 1,
        "text": "Introduction\n\nThis research paper explores the applications of machine learning...",
        "wordCount": 523,
        "images": [
          "image_1_base64..."
        ]
      },
      {
        "pageNumber": 2,
        "text": "Methodology\n\nOur approach involves collecting data from multiple sources...",
        "wordCount": 612,
        "images": []
      }
    ],
    "metadata": {
      "title": "Machine Learning Applications in Healthcare",
      "author": "Dr. Jane Smith",
      "creationDate": "2024-01-15",
      "pageCount": 10,
      "fileSize": 2456789,
      "format": "PDF"
    },
    "extractedText": "Introduction\n\nThis research paper explores the applications of machine learning...\n\nMethodology\n\nOur approach involves...",
    "images": [
      "image_1_base64..."
    ],
    "totalPages": 10,
    "processedPages": 10
  },
  "credits_used": 20,
  "credits_remaining": 980,
  "processing_time": 3450
}

Field Descriptions

data.pagesArray of page objects with text and images per page

data.metadataDocument metadata (title, author, dates, format)

data.extractedTextCombined text from all pages

data.imagesArray of extracted images in base64 format (if extractImages: true)

data.totalPagesTotal number of pages in the document

credits_usedCredits deducted (2 per page × 10 pages = 20 credits)

processing_timeTotal processing time in milliseconds

Error Handling

Unsupported Format (400 Bad Request)

The document format is not supported. Supported formats: PDF, DOCX, TXT.

File Too Large (413 Payload Too Large)

The document exceeds the maximum file size of 50MB. Split large documents into smaller files.

Corrupted Document (422 Unprocessable Entity)

The document is corrupted or password-protected. Ensure the file is valid and not encrypted.

Insufficient Credits (402 Payment Required)

Your account doesn't have enough credits for this document (need {pageCount} × 2 credits). Purchase more credits.

Rate Limit Exceeded (429 Too Many Requests)

You've exceeded your plan's rate limit. Wait a moment or upgrade your plan for higher limits.

Pro Tip: Use the maxPages parameter to limit credit usage when processing large documents. Process in batches if you only need specific sections.

Credit Cost

2 credits

2 credits per page (4 credits with OCR)

Each page processed costs 2 credits. Enable OCR for an additional 2 credits per page.

Example: 10-page PDF = 20 credits (or 40 credits with OCR)

Free Plan: 1,000 credits/month = 500 pages (or 250 pages with OCR)

Hobby Plan: 5,000 credits/month = 2,500 pages ($19/mo)

Professional Plan: 50,000 credits/month = 25,000 pages ($99/mo)