crawl_deep
Discover and crawl entire websites with intelligent breadth-first search, URL filtering, and configurable depth control. Respects robots.txt and crawl delays.
Use Cases
Site Architecture Analysis
Discover all pages and understand website structure for SEO audits
Content Discovery
Find all blog posts, products, or documentation pages automatically
Competitive Intelligence
Map competitor websites and discover new products or features
Broken Link Detection
Crawl sites to find 404s, redirects, and broken internal links
Data Migration
Discover all pages before migrating or archiving a website
Sitemap Generation
Create comprehensive sitemaps for SEO or documentation
Endpoint
/api/v1/tools/crawl_deep
Parameters
Name | Type | Required | Default | Description |
---|---|---|---|---|
url | string | Required | - | Starting URL for the crawl (must be same domain) Example: https://example.com |
maxDepth | number | Optional | 3 | Maximum crawl depth (1-10 levels) Example: 5 |
maxPages | number | Optional | 100 | Maximum pages to crawl (1-1000) Example: 500 |
includePatterns | string[] | Optional | - | Only crawl URLs matching these regex patterns Example: ["/blog/.*", "/products/.*"] |
excludePatterns | string[] | Optional | - | Skip URLs matching these regex patterns Example: ["/admin/.*", ".*\\.(pdf|zip)$"] |
respectRobotsTxt | boolean | Optional | true | Respect robots.txt directives Example: true |
sameDomain | boolean | Optional | true | Only crawl URLs on the same domain Example: true |
crawlDelay | number | Optional | 1000 | Delay between requests in milliseconds (100-5000) Example: 2000 |
Request Examples
Response Example
{ "success": true, "data": { "startUrl": "https://example.com", "pagesDiscovered": 487, "pagesCrawled": 487, "maxDepthReached": 5, "robotsTxtRespected": true, "crawlStarted": "2025-10-01T12:00:00Z", "crawlCompleted": "2025-10-01T12:00:45Z", "urls": [ { "url": "https://example.com", "depth": 0, "status": 200, "title": "Example Domain", "linksFound": 15 }, { "url": "https://example.com/blog", "depth": 1, "status": 200, "title": "Blog - Example", "linksFound": 42 }, { "url": "https://example.com/blog/post-1", "depth": 2, "status": 200, "title": "First Blog Post", "linksFound": 8 } ], "statistics": { "status200": 450, "status301": 20, "status404": 15, "status500": 2, "avgResponseTime": 234, "totalSize": 12500000 } }, "credits_used": 487, "credits_remaining": 513, "processing_time": 45200}
data.pagesDiscovered
Total unique URLs found during crawldata.pagesCrawled
Number of pages successfully fetcheddata.maxDepthReached
Maximum depth level reacheddata.urls
Array of all discovered URLs with metadatadata.statistics
Aggregate crawl statisticscredits_used
1 credit per page crawled (not per page discovered)processing_time
Total crawl duration (varies by site size)Error Handling
Robots.txt Blocked (403 Forbidden)
The site's robots.txt disallows crawling. Set respectRobotsTxt=false to override (use responsibly).
Max Pages Reached (200 OK with warning)
Crawl stopped at maxPages limit. Increase limit or filter URLs more specifically.
Invalid Pattern (400 Bad Request)
includePatterns or excludePatterns contains invalid regex. Check pattern syntax.
Insufficient Credits (402 Payment Required)
Credits reserved upfront (estimated). Add more credits before starting large crawls.
Credit Cost
Example Costs:
Small site (50 pages): 50 credits
Medium site (500 pages): 500 credits
Large site (1000 pages max): 1,000 credits
Plan Recommendations:
Free Plan: 1,000 credits = 1,000 pages or 20 medium sites
Hobby Plan: 5,000 credits = 5,000 pages or 100 medium sites ($19/mo)
Professional Plan: 50,000 credits = 50 large sites ($99/mo)