Advanced Guide

Stealth Scraping Techniques

Bypass anti-bot detection systems with advanced browser fingerprinting, IP rotation, user-agent spoofing, and CAPTCHA handling strategies.

Using stealth_mode Tool

Browser Fingerprinting

IP Rotation & Proxies

CAPTCHA Handling

Legal Notice: Always respect robots.txt and website terms of service. Use stealth techniques responsibly and only for legitimate purposes. Violating terms of service or scraping protected content may have legal consequences.

1. Using stealth_mode Tool

The stealth_mode tool automatically applies anti-detection techniques including fingerprint randomization, WebRTC spoofing, and canvas noise.

Basic (3 credits)

User-agent rotation, basic header spoofing

Use for: Low-protection sites, simple scrapers

Medium (3 credits)

Basic + fingerprint randomization, WebRTC leak protection

Use for: Most commercial sites, moderate protection

Advanced (3 credits)

Medium + canvas noise, WebGL spoofing, timezone randomization

Use for: High-protection sites, Cloudflare, Akamai

Basic Stealth Scraping

3 credits

Bash

curl -X POST https://crawlforge.dev/api/v1/tools/stealth_mode \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://protected-site.com",
    "level": "medium",
    "randomizeFingerprint": true
  }'

Advanced: Stealth + Proxy + Custom Headers

Typescript

const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://highly-protected-site.com',
    level: 'advanced',
    randomizeFingerprint: true,
    proxy: {
      server: 'http://proxy.example.com:8080',
      username: 'proxy_user',
      password: 'proxy_pass'
    },
    customHeaders: {
      'Accept-Language': 'en-US,en;q=0.9',
      'Accept-Encoding': 'gzip, deflate, br',
      'Referer': 'https://google.com',
      'Sec-Fetch-Dest': 'document',
      'Sec-Fetch-Mode': 'navigate',
      'Sec-Fetch-Site': 'none'
    },
    geolocation: {
      latitude: 40.7128,
      longitude: -74.0060,
      accuracy: 100
    },
    timezone: 'America/New_York',
    locale: 'en-US'
  }),
});

const data = await response.json();
console.log('Successfully bypassed protection');

2. Browser Fingerprinting

Anti-bot systems use browser fingerprinting to detect automated browsers. Randomize fingerprints to avoid detection.

User-Agent

Browser version, OS, device type

Canvas Fingerprint

Unique rendering signature

WebGL

Graphics card vendor/renderer

WebRTC

Local IP address leaks

Screen Resolution

Display dimensions

Timezone & Locale

Geographic location indicators

Fonts

Installed font list

Plugins

Browser extensions detected

How CrawlForge Helps: The stealth_mode tool automatically randomizes all these fingerprint signals, making each request appear to come from a unique real browser.

3. IP Rotation & Proxies

Use rotating proxies to distribute requests across different IP addresses.

Datacenter Proxies
- ✅ Fast (50-150ms latency)
- ✅ Cheap ($1-5/GB)
- ❌ Easily detected
- ❌ Higher ban rate
- Best for: Low-protection sites, high-volume scraping
Residential Proxies (Recommended)
- ✅ Real user IPs (hard to detect)
- ✅ Low ban rate
- ⚠️ Slower (200-500ms latency)
- ⚠️ Expensive ($5-15/GB)
- Best for: High-protection sites, e-commerce, social media
Mobile Proxies
- ✅ Highest success rate (4G/5G IPs)
- ✅ Nearly undetectable
- ❌ Very expensive ($50-100/GB)
- ❌ Slowest (300-1000ms latency)
- Best for: Maximum stealth, premium targets

IP Rotation Strategy

Typescript

// Rotating proxy pool
const proxyPool = [
  'http://user:pass@proxy1.example.com:8080',
  'http://user:pass@proxy2.example.com:8080',
  'http://user:pass@proxy3.example.com:8080',
  'http://user:pass@proxy4.example.com:8080',
  'http://user:pass@proxy5.example.com:8080'
];

let currentProxyIndex = 0;

function getNextProxy() {
  const proxy = proxyPool[currentProxyIndex];
  currentProxyIndex = (currentProxyIndex + 1) % proxyPool.length;
  return proxy;
}

// Scrape with rotating proxies
async function scrapeWithRotatingProxy(url: string) {
  const proxy = getNextProxy();

  const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', {
    method: 'POST',
    headers: {
      'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url,
      level: 'medium',
      proxy: { server: proxy }
    }),
  });

  return await response.json();
}

// Scrape multiple URLs with different proxies
const urls = ['https://example.com/1', 'https://example.com/2', 'https://example.com/3'];
const results = await Promise.all(urls.map(scrapeWithRotatingProxy));

4. CAPTCHA Handling

Strategies for dealing with CAPTCHA challenges.

Avoid Triggering CAPTCHAs
- Use stealth mode, rotate IPs, respect rate limits, add random delays (2-5 seconds between requests)
- ✅ Best strategy - prevention is easier than solving
Use CAPTCHA Solving Services
- Integrate with 2Captcha, Anti-Captcha, or DeathByCaptcha ($1-3 per 1,000 CAPTCHAs)
- ⚠️ Adds cost and latency (10-30 seconds)
Find Alternative Data Sources
- Look for APIs, RSS feeds, sitemaps, or partner sites without CAPTCHA
- ✅ Most reliable long-term solution
Manual Intervention
- Queue CAPTCHA challenges for human operators to solve
- ❌ Only viable for low-volume scraping

Recommendation: If you're consistently hitting CAPTCHAs, you're scraping too aggressively. Slow down, rotate IPs more frequently, and use higher stealth levels before resorting to CAPTCHA solving services.

Best Practices Summary

Always start with stealth_mode level "medium"
Use residential proxies for high-protection sites
Rotate proxies every 10-20 requests
Add random delays between requests (2-5 seconds)
Match geolocation with proxy location (use localization tool)
Respect robots.txt and rate limits
Monitor ban rates and adjust strategy accordingly

Next Steps

Continue learning with more guides

Credit Optimization →

Minimize scraping costs

stealth_mode Tool →

Full API reference

curl -X POST https://crawlforge.dev/api/v1/tools/stealth_mode \ -H "X-API-Key: cf_test_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://protected-site.com", "level": "medium", "randomizeFingerprint": true }'

// Rotating proxy pool const proxyPool = [ 'http://user:pass@proxy1.example.com:8080', 'http://user:pass@proxy2.example.com:8080', 'http://user:pass@proxy3.example.com:8080', 'http://user:pass@proxy4.example.com:8080', 'http://user:pass@proxy5.example.com:8080' ]; let currentProxyIndex = 0; function getNextProxy() { const proxy = proxyPool[currentProxyIndex]; currentProxyIndex = (currentProxyIndex + 1) % proxyPool.length; return proxy; } // Scrape with rotating proxies async function scrapeWithRotatingProxy(url: string) { const proxy = getNextProxy(); const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', { method: 'POST', headers: { 'X-API-Key': process.env.CRAWLFORGE_API_KEY!, 'Content-Type': 'application/json', }, body: JSON.stringify({ url, level: 'medium', proxy: { server: proxy } }), }); return await response.json(); } // Scrape multiple URLs with different proxies const urls = ['https://example.com/1', 'https://example.com/2', 'https://example.com/3']; const results = await Promise.all(urls.map(scrapeWithRotatingProxy));