On this page
Migrating 500 pages from WordPress to a headless CMS should take a weekend. In reality, it takes 3-6 weeks -- because someone has to manually copy content, fix formatting, re-link images, and verify every page. Content migration is the single most dreaded task in web development, and it is almost entirely automatable.
CrawlForge extracts your entire site's content programmatically: pages, metadata, images, links, and document structure. This guide shows you how to build a migration pipeline that moves thousands of pages between any two platforms in hours, not weeks.
Table of Contents
- Why Content Migration Is Painful
- Architecture Overview
- Step 1: Inventory Your Source Site
- Step 2: Extract Content and Metadata
- Step 3: Preserve Document Structure
- Step 4: Transform for the Target Platform
- Step 5: Validate the Migration
- Credit Cost Analysis
- Results and Benefits
- Frequently Asked Questions
Why Content Migration Is Painful
Content migration fails for three reasons:
- Volume: Even a small business site has 200-500 pages. Each page needs content, metadata, images, and internal links preserved
- Format mismatch: Source and target CMS use different content models (WordPress blocks vs. MDX vs. Contentful rich text)
- Hidden complexity: Shortcodes, embedded media, custom fields, redirects -- all need handling
Manual migration costs approximately $5-15 per page in analyst time. A 500-page migration at $10/page costs $5,000 in labor alone. Automated migration with CrawlForge costs under $50 in credits.
| Migration Method | Cost (500 pages) | Time | Error Rate |
|---|---|---|---|
| Manual copy-paste | $5,000-7,500 | 3-6 weeks | 5-10% |
| Semi-automated (scripts) | $2,000-3,000 | 1-2 weeks | 2-5% |
| CrawlForge pipeline | $20-50 | 2-4 hours | <1% |
Architecture Overview
The migration pipeline uses five CrawlForge tools:
| Stage | Tool | Credits | Purpose |
|---|---|---|---|
| Inventory | map_site | 3 | Discover all pages and their structure |
| Content extraction | extract_content | 2 | Pull clean content from each page |
| Metadata capture | extract_metadata | 1 | Preserve SEO tags and Open Graph data |
| Link mapping | extract_links | 1 | Map internal links for rewriting |
| Batch processing | batch_scrape | 5 | Process hundreds of pages efficiently |
Step 1: Inventory Your Source Site
Map every page on your source site, including pages that may not be in the navigation.
Step 2: Extract Content and Metadata
Extract clean content and all metadata from every page, preserving heading structure and formatting.
Step 3: Preserve Document Structure
For large sites, use batch processing and build a complete link map for URL rewriting.
Step 4: Transform for the Target Platform
Rewrite internal links and transform content to match your target CMS format.
Step 5: Validate the Migration
After transforming, verify that every page was migrated correctly.
Credit Cost Analysis
For a 500-page website migration:
| Operation | Tool | Credits | Quantity | Subtotal |
|---|---|---|---|---|
| Site inventory | map_site | 3 | 1 | 3 |
| Content extraction | extract_content | 2 | 500 pages | 1,000 |
| Metadata extraction | extract_metadata | 1 | 500 pages | 500 |
| Link extraction | extract_links | 1 | 500 pages | 500 |
| Total | 2,003 credits |
A complete 500-page migration costs about 2,000 credits. The Hobby plan ($19/month, 3,000 credits) handles this with room to spare. For larger sites (1,000+ pages), the Professional plan ($99/month, 15,000 credits) provides plenty of headroom.
Results and Benefits
Automated content migration delivers:
- Speed: Migrate 500 pages in 2-4 hours instead of 3-6 weeks
- Accuracy: No copy-paste errors, broken formatting, or missed pages
- Completeness: Every page, every meta tag, every internal link captured
- Cost savings: $5,000+ in manual labor replaced by $19-99 in tool credits
CrawlForge is best for content migrations where you need to preserve SEO equity -- meta tags, internal links, canonical URLs, and content structure all transfer cleanly.
Frequently Asked Questions
Can CrawlForge handle WordPress shortcodes?
CrawlForge's extract_content tool processes the rendered HTML, not raw WordPress source. Shortcodes are already expanded to their output HTML when CrawlForge extracts them. You get the rendered content, which is what you want for migration.
What about images and media files?
CrawlForge extracts image URLs from content. You will need a separate step to download and re-host images on your target platform. The fetch_url tool (1 credit) can download individual media files.
How do I handle redirects after migration?
The urlMap generated in Step 3 gives you a complete old-URL-to-new-slug mapping. Export this as a redirect map for your hosting platform (Vercel vercel.json, Netlify _redirects, or nginx config).
Migrate your site this weekend. Start free with 1,000 credits -- enough to migrate 250+ pages. No credit card required.
Related resources: