CrawlForge
Advanced Guide

Stealth Scraping Techniques

Bypass anti-bot detection systems with advanced browser fingerprinting, IP rotation, user-agent spoofing, and CAPTCHA handling strategies.

Using stealth_mode Tool
Browser Fingerprinting
IP Rotation & Proxies
CAPTCHA Handling
Legal Notice: Always respect robots.txt and website terms of service. Use stealth techniques responsibly and only for legitimate purposes. Violating terms of service or scraping protected content may have legal consequences.

1. Using stealth_mode Tool

The stealth_mode tool automatically applies anti-detection techniques including fingerprint randomization, WebRTC spoofing, and canvas noise.

Basic (3 credits)
User-agent rotation, basic header spoofing

Use for: Low-protection sites, simple scrapers

Medium (3 credits)
Basic + fingerprint randomization, WebRTC leak protection

Use for: Most commercial sites, moderate protection

Advanced (3 credits)
Medium + canvas noise, WebGL spoofing, timezone randomization

Use for: High-protection sites, Cloudflare, Akamai

Basic Stealth Scraping

3 credits

Bash
curl -X POST https://crawlforge.dev/api/v1/tools/stealth_mode \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://protected-site.com",
    "level": "medium",
    "randomizeFingerprint": true
  }'

Advanced: Stealth + Proxy + Custom Headers

Typescript
const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://highly-protected-site.com',
    level: 'advanced',
    randomizeFingerprint: true,
    proxy: {
      server: 'http://proxy.example.com:8080',
      username: 'proxy_user',
      password: 'proxy_pass'
    },
    customHeaders: {
      'Accept-Language': 'en-US,en;q=0.9',
      'Accept-Encoding': 'gzip, deflate, br',
      'Referer': 'https://google.com',
      'Sec-Fetch-Dest': 'document',
      'Sec-Fetch-Mode': 'navigate',
      'Sec-Fetch-Site': 'none'
    },
    geolocation: {
      latitude: 40.7128,
      longitude: -74.0060,
      accuracy: 100
    },
    timezone: 'America/New_York',
    locale: 'en-US'
  }),
});

const data = await response.json();
console.log('Successfully bypassed protection');

2. Browser Fingerprinting

Anti-bot systems use browser fingerprinting to detect automated browsers. Randomize fingerprints to avoid detection.

User-Agent
Browser version, OS, device type
Canvas Fingerprint
Unique rendering signature
WebGL
Graphics card vendor/renderer
WebRTC
Local IP address leaks
Screen Resolution
Display dimensions
Timezone & Locale
Geographic location indicators
Fonts
Installed font list
Plugins
Browser extensions detected
How CrawlForge Helps: The stealth_mode tool automatically randomizes all these fingerprint signals, making each request appear to come from a unique real browser.

3. IP Rotation & Proxies

Use rotating proxies to distribute requests across different IP addresses.

  • Datacenter Proxies
    • ✅ Fast (50-150ms latency)
    • ✅ Cheap ($1-5/GB)
    • ❌ Easily detected
    • ❌ Higher ban rate
    • Best for: Low-protection sites, high-volume scraping
  • Residential Proxies (Recommended)
    • ✅ Real user IPs (hard to detect)
    • ✅ Low ban rate
    • ⚠️ Slower (200-500ms latency)
    • ⚠️ Expensive ($5-15/GB)
    • Best for: High-protection sites, e-commerce, social media
  • Mobile Proxies
    • ✅ Highest success rate (4G/5G IPs)
    • ✅ Nearly undetectable
    • ❌ Very expensive ($50-100/GB)
    • ❌ Slowest (300-1000ms latency)
    • Best for: Maximum stealth, premium targets

IP Rotation Strategy

Typescript
// Rotating proxy pool
const proxyPool = [
  'http://user:pass@proxy1.example.com:8080',
  'http://user:pass@proxy2.example.com:8080',
  'http://user:pass@proxy3.example.com:8080',
  'http://user:pass@proxy4.example.com:8080',
  'http://user:pass@proxy5.example.com:8080'
];

let currentProxyIndex = 0;

function getNextProxy() {
  const proxy = proxyPool[currentProxyIndex];
  currentProxyIndex = (currentProxyIndex + 1) % proxyPool.length;
  return proxy;
}

// Scrape with rotating proxies
async function scrapeWithRotatingProxy(url: string) {
  const proxy = getNextProxy();

  const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', {
    method: 'POST',
    headers: {
      'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url,
      level: 'medium',
      proxy: { server: proxy }
    }),
  });

  return await response.json();
}

// Scrape multiple URLs with different proxies
const urls = ['https://example.com/1', 'https://example.com/2', 'https://example.com/3'];
const results = await Promise.all(urls.map(scrapeWithRotatingProxy));

4. CAPTCHA Handling

Strategies for dealing with CAPTCHA challenges.

  1. Avoid Triggering CAPTCHAs
    • Use stealth mode, rotate IPs, respect rate limits, add random delays (2-5 seconds between requests)
    • ✅ Best strategy - prevention is easier than solving
  2. Use CAPTCHA Solving Services
    • Integrate with 2Captcha, Anti-Captcha, or DeathByCaptcha ($1-3 per 1,000 CAPTCHAs)
    • ⚠️ Adds cost and latency (10-30 seconds)
  3. Find Alternative Data Sources
    • Look for APIs, RSS feeds, sitemaps, or partner sites without CAPTCHA
    • ✅ Most reliable long-term solution
  4. Manual Intervention
    • Queue CAPTCHA challenges for human operators to solve
    • ❌ Only viable for low-volume scraping
Recommendation: If you're consistently hitting CAPTCHAs, you're scraping too aggressively. Slow down, rotate IPs more frequently, and use higher stealth levels before resorting to CAPTCHA solving services.

Best Practices Summary

  • Always start with stealth_mode level "medium"
  • Use residential proxies for high-protection sites
  • Rotate proxies every 10-20 requests
  • Add random delays between requests (2-5 seconds)
  • Match geolocation with proxy location (use localization tool)
  • Respect robots.txt and rate limits
  • Monitor ban rates and adjust strategy accordingly
Next Steps
Continue learning with more guides
Credit Optimization →
Minimize scraping costs
stealth_mode Tool →
Full API reference