Crawling1 creditPer Page

crawl_deep

Discover and crawl entire websites with intelligent breadth-first search, URL filtering, and configurable depth control. Respects robots.txt and crawl delays.

Use Cases

Site Architecture Analysis

Discover all pages and understand website structure for SEO audits

Content Discovery

Find all blog posts, products, or documentation pages automatically

Competitive Intelligence

Map competitor websites and discover new products or features

Broken Link Detection

Crawl sites to find 404s, redirects, and broken internal links

Data Migration

Discover all pages before migrating or archiving a website

Sitemap Generation

Create comprehensive sitemaps for SEO or documentation

Endpoint

POST/api/v1/tools/crawl_deep

Auth Required

2 req/s on Free plan

1 credit

Parameters

Name	Type	Required	Default	Description
url	string	Required	-	Starting URL for the crawl (must be same domain) Example: https://example.com
maxDepth	number	Optional	3	Maximum crawl depth (1-10 levels) Example: 5
maxPages	number	Optional	100	Maximum pages to crawl (1-1000) Example: 500
includePatterns	string[]	Optional	-	Only crawl URLs matching these regex patterns Example: ["/blog/.", "/products/."]
excludePatterns	string[]	Optional	-	Skip URLs matching these regex patterns Example: ["/admin/.", ".\\.(pdf\|zip)$"]
respectRobotsTxt	boolean	Optional	true	Respect robots.txt directives Example: true
sameDomain	boolean	Optional	true	Only crawl URLs on the same domain Example: true
crawlDelay	number	Optional	1000	Delay between requests in milliseconds (100-5000) Example: 2000

Request Examples

terminalBash

Response Example

200 OK45,200ms

{
  "success": true,
  "data": {
    "startUrl": "https://example.com",
    "pagesDiscovered": 487,
    "pagesCrawled": 487,
    "maxDepthReached": 5,
    "robotsTxtRespected": true,
    "crawlStarted": "2025-10-01T12:00:00Z",
    "crawlCompleted": "2025-10-01T12:00:45Z",
    "urls": [
      {
        "url": "https://example.com",
        "depth": 0,
        "status": 200,
        "title": "Example Domain",
        "linksFound": 15
      },
      {
        "url": "https://example.com/blog",
        "depth": 1,
        "status": 200,
        "title": "Blog - Example",
        "linksFound": 42
      },
      {
        "url": "https://example.com/blog/post-1",
        "depth": 2,
        "status": 200,
        "title": "First Blog Post",
        "linksFound": 8
      }
    ],
    "statistics": {
      "status200": 450,
      "status301": 20,
      "status404": 15,
      "status500": 2,
      "avgResponseTime": 234,
      "totalSize": 12500000
    }
  },
  "credits_used": 487,
  "credits_remaining": 513,
  "processing_time": 45200
}

Field Descriptions

data.pagesDiscoveredTotal unique URLs found during crawl

data.pagesCrawledNumber of pages successfully fetched

data.maxDepthReachedMaximum depth level reached

data.urlsArray of all discovered URLs with metadata

data.statisticsAggregate crawl statistics

credits_used1 credit per page crawled (not per page discovered)

processing_timeTotal crawl duration (varies by site size)

Error Handling

Robots.txt Blocked (403 Forbidden)

The site's robots.txt disallows crawling. Set respectRobotsTxt=false to override (use responsibly).

Max Pages Reached (200 OK with warning)

Crawl stopped at maxPages limit. Increase limit or filter URLs more specifically.

Invalid Pattern (400 Bad Request)

includePatterns or excludePatterns contains invalid regex. Check pattern syntax.

Insufficient Credits (402 Payment Required)

Credits reserved upfront (estimated). Add more credits before starting large crawls.

Pro Tip: Use includePatterns to crawl specific sections (e.g., /blog/). This saves credits and reduces crawl time. Respect crawlDelay to avoid overwhelming smaller sites—1-2 seconds is recommended.