Advanced Guide
Stealth Scraping Techniques
Bypass anti-bot detection systems with advanced browser fingerprinting, IP rotation, user-agent spoofing, and CAPTCHA handling strategies.
Legal Notice: Always respect robots.txt and website terms of service. Use stealth techniques responsibly and only for legitimate purposes. Violating terms of service or scraping protected content may have legal consequences.
1. Using stealth_mode Tool
The stealth_mode tool automatically applies anti-detection techniques including fingerprint randomization, WebRTC spoofing, and canvas noise.
Basic (3 credits)
User-agent rotation, basic header spoofing
Use for: Low-protection sites, simple scrapers
Medium (3 credits)
Basic + fingerprint randomization, WebRTC leak protection
Use for: Most commercial sites, moderate protection
Advanced (3 credits)
Medium + canvas noise, WebGL spoofing, timezone randomization
Use for: High-protection sites, Cloudflare, Akamai
Basic Stealth Scraping
3 credits
Bash
curl -X POST https://crawlforge.dev/api/v1/tools/stealth_mode \
-H "X-API-Key: cf_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://protected-site.com",
"level": "medium",
"randomizeFingerprint": true
}'Advanced: Stealth + Proxy + Custom Headers
Typescript
const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', {
method: 'POST',
headers: {
'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://highly-protected-site.com',
level: 'advanced',
randomizeFingerprint: true,
proxy: {
server: 'http://proxy.example.com:8080',
username: 'proxy_user',
password: 'proxy_pass'
},
customHeaders: {
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://google.com',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none'
},
geolocation: {
latitude: 40.7128,
longitude: -74.0060,
accuracy: 100
},
timezone: 'America/New_York',
locale: 'en-US'
}),
});
const data = await response.json();
console.log('Successfully bypassed protection');2. Browser Fingerprinting
Anti-bot systems use browser fingerprinting to detect automated browsers. Randomize fingerprints to avoid detection.
User-Agent
Browser version, OS, device type
Canvas Fingerprint
Unique rendering signature
WebGL
Graphics card vendor/renderer
WebRTC
Local IP address leaks
Screen Resolution
Display dimensions
Timezone & Locale
Geographic location indicators
Fonts
Installed font list
Plugins
Browser extensions detected
How CrawlForge Helps: The
stealth_mode tool automatically randomizes all these fingerprint signals, making each request appear to come from a unique real browser.3. IP Rotation & Proxies
Use rotating proxies to distribute requests across different IP addresses.
- Datacenter Proxies
- ✅ Fast (50-150ms latency)
- ✅ Cheap ($1-5/GB)
- ❌ Easily detected
- ❌ Higher ban rate
- Best for: Low-protection sites, high-volume scraping
- Residential Proxies (Recommended)
- ✅ Real user IPs (hard to detect)
- ✅ Low ban rate
- ⚠️ Slower (200-500ms latency)
- ⚠️ Expensive ($5-15/GB)
- Best for: High-protection sites, e-commerce, social media
- Mobile Proxies
- ✅ Highest success rate (4G/5G IPs)
- ✅ Nearly undetectable
- ❌ Very expensive ($50-100/GB)
- ❌ Slowest (300-1000ms latency)
- Best for: Maximum stealth, premium targets
IP Rotation Strategy
Typescript
// Rotating proxy pool
const proxyPool = [
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
'http://user:pass@proxy3.example.com:8080',
'http://user:pass@proxy4.example.com:8080',
'http://user:pass@proxy5.example.com:8080'
];
let currentProxyIndex = 0;
function getNextProxy() {
const proxy = proxyPool[currentProxyIndex];
currentProxyIndex = (currentProxyIndex + 1) % proxyPool.length;
return proxy;
}
// Scrape with rotating proxies
async function scrapeWithRotatingProxy(url: string) {
const proxy = getNextProxy();
const response = await fetch('https://crawlforge.dev/api/v1/tools/stealth_mode', {
method: 'POST',
headers: {
'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
level: 'medium',
proxy: { server: proxy }
}),
});
return await response.json();
}
// Scrape multiple URLs with different proxies
const urls = ['https://example.com/1', 'https://example.com/2', 'https://example.com/3'];
const results = await Promise.all(urls.map(scrapeWithRotatingProxy));4. CAPTCHA Handling
Strategies for dealing with CAPTCHA challenges.
- Avoid Triggering CAPTCHAs
- Use stealth mode, rotate IPs, respect rate limits, add random delays (2-5 seconds between requests)
- ✅ Best strategy - prevention is easier than solving
- Use CAPTCHA Solving Services
- Integrate with 2Captcha, Anti-Captcha, or DeathByCaptcha ($1-3 per 1,000 CAPTCHAs)
- ⚠️ Adds cost and latency (10-30 seconds)
- Find Alternative Data Sources
- Look for APIs, RSS feeds, sitemaps, or partner sites without CAPTCHA
- ✅ Most reliable long-term solution
- Manual Intervention
- Queue CAPTCHA challenges for human operators to solve
- ❌ Only viable for low-volume scraping
Recommendation: If you're consistently hitting CAPTCHAs, you're scraping too aggressively. Slow down, rotate IPs more frequently, and use higher stealth levels before resorting to CAPTCHA solving services.
Best Practices Summary
- Always start with
stealth_modelevel "medium" - Use residential proxies for high-protection sites
- Rotate proxies every 10-20 requests
- Add random delays between requests (2-5 seconds)
- Match geolocation with proxy location (use
localizationtool) - Respect robots.txt and rate limits
- Monitor ban rates and adjust strategy accordingly
Next Steps
Continue learning with more guides