batch_scrape
Scrape multiple URLs in parallel with async job management, webhook notifications, and configurable concurrency. Perfect for bulk data collection and automated workflows.
Use Cases
Bulk Data Collection
Scrape product catalogs, news articles, or research papers across multiple pages simultaneously
Competitor Analysis
Monitor pricing, features, and content across competitor websites in one batch
Automated Workflows
Integrate with webhooks for real-time processing as scraping jobs complete
Scheduled Reporting
Generate daily reports by batch scraping dashboards, analytics, or status pages
Content Archival
Archive multiple pages as screenshots or PDFs for compliance or historical records
Parallel Processing
Control concurrency levels to optimize speed while respecting rate limits
Endpoint
/api/v1/tools/batch_scrape
Parameters
Name | Type | Required | Default | Description |
---|---|---|---|---|
urls | string[] | Required | - | Array of URLs to scrape (1-50 URLs) Example: ["https://example.com", "https://example.org"] |
formats | string[] | Optional | ["markdown"] | Output formats for each URL: markdown, html, text, screenshot, or pdf Example: ["markdown", "screenshot"] |
webhook | string | Optional | - | Webhook URL to receive job completion notification Example: https://yourapp.com/webhook/scrape-complete |
maxConcurrency | number | Optional | 5 | Maximum concurrent requests (1-10) Example: 10 |
timeout | number | Optional | 30000 | Timeout per URL in milliseconds Example: 45000 |
onlyMainContent | boolean | Optional | false | Extract only main content, removing boilerplate Example: true |
Request Examples
Response Example
{ "success": true, "data": { "jobId": "batch_1234567890abcdef", "status": "processing", "totalUrls": 3, "completed": 0, "successful": 0, "failed": 0, "startedAt": "2025-10-01T12:00:00Z", "estimatedCompletionAt": "2025-10-01T12:02:00Z", "results": [] }, "credits_used": 3, "credits_remaining": 997, "processing_time": 156}
data.jobId
Unique identifier for tracking this batch jobdata.status
Job status: queued, processing, completed, or faileddata.totalUrls
Total number of URLs in the batchdata.completed
Number of URLs processed (successful + failed)data.estimatedCompletionAt
Estimated completion time based on concurrencycredits_used
Credits reserved for this batch (1 per URL)credits_remaining
Your remaining credit balanceWebhook Payload
When the batch completes, your webhook URL will receive:
Error Handling
Too Many URLs (400 Bad Request)
Maximum 50 URLs per batch. Split large batches into multiple requests.
Invalid Webhook URL (400 Bad Request)
Webhook must be a valid HTTPS URL. HTTP webhooks are not supported for security.
Insufficient Credits (402 Payment Required)
Batch requires credits upfront (1 per URL). Add more credits before retrying.
Job Not Found (404 Not Found)
The job ID doesn't exist or has expired. Jobs are retained for 7 days after completion.
Credit Cost
Example Costs:
10 URLs: 10 credits (perfect for small batches)
50 URLs (max): 50 credits (ideal for bulk scraping)
Plan Recommendations:
Free Plan: 1,000 credits = 20 batches of 50 URLs
Hobby Plan: 5,000 credits = 100 batches of 50 URLs ($19/mo)
Professional Plan: 50,000 credits = 1,000 batches of 50 URLs ($99/mo)