CrawlForge
结构化工具2 credits

scrape_structured

使用自定义 CSS 选择器从任意网页提取结构化数据。非常适合电商商品抓取、新闻聚合以及任何自定义数据提取需求。

使用场景

电商商品抓取

从在线商店提取商品标题、价格、描述和图片

新闻文章提取

从新闻站点提取标题、作者、日期和正文

自定义数据转换

将任意 HTML 结构映射到你期望的 JSON schema

房产房源

从房源站点提取房产详情、价格和图片

Endpoint

POST/api/v1/tools/scrape_structured
Auth Required
Free 计划 2 req/s
2 credits

Parameters

NameTypeRequiredDefaultDescription
url
stringRequired-
要抓取的 URL
Example: https://example.com/product
selectors
objectRequired-
将字段名映射到选择器的 CSS 选择器
Example: {"title": "h1.product-title", "price": ".price", "description": ".product-desc"}

CSS 选择器:

可使用任何有效的 CSS 选择器语法。常见写法:

  • .className - 按 class 选择
  • #id - 按 ID 选择
  • tag.class - 组合标签和 class
  • .parent > .child - 直接子元素
  • [data-id="value"] - 属性选择器

请求示例

cURL - 电商商品

terminalBash
curl -X POST https://crawlforge.dev/api/v1/tools/scrape_structured \
  -H "X-API-Key: cf_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/123",
    "selectors": {
      "title": "h1.product-title",
      "price": ".price-value",
      "currency": ".price-currency",
      "description": ".product-description",
      "image": "img.main-image",
      "rating": ".rating-value",
      "availability": ".stock-status"
    }
  }'

TypeScript - 新闻文章

scrapeStructured.tsTypescript
const response = await fetch('https://crawlforge.dev/api/v1/tools/scrape_structured', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/news/article-123',
    selectors: {
      headline: 'h1.article-title',
      author: '.author-name',
      publishDate: 'time.publish-date',
      category: '.category-tag',
      content: '.article-body',
      image: '.article-image img'
    }
  }),
});

const data = await response.json();

if (data.success) {
  const article = data.data;
  console.log(`Article: ${article.headline}`);
  console.log(`By: ${article.author}`);
  console.log(`Published: ${article.publishDate}`);
}

Python - 房产房源

scrape_structured.pyPython
import requests
import os

response = requests.post(
    'https://crawlforge.dev/api/v1/tools/scrape_structured',
    headers={
        'X-API-Key': os.environ['CRAWLFORGE_API_KEY'],
        'Content-Type': 'application/json',
    },
    json={
        'url': 'https://example.com/property/456',
        'selectors': {
            'address': '.property-address',
            'price': '.listing-price',
            'bedrooms': '.bed-count',
            'bathrooms': '.bath-count',
            'sqft': '.square-feet',
            'description': '.property-description',
            'images': '.gallery img'
        }
    }
)

data = response.json()

if data['success']:
    property_data = data['data']
    print(f"Property: {property_data['address']}")
    print(f"Price: {property_data['price']}")
    print(f"Beds: {property_data['bedrooms']}")
    print(f"Baths: {property_data['bathrooms']}")

响应示例

200 OK320ms
{
"success": true,
"data": {
"title": "Premium Wireless Headphones",
"price": "299.99",
"currency": "USD",
"description": "High-quality wireless headphones with active noise cancellation and 30-hour battery life.",
"image": "https://example.com/images/headphones.jpg",
"rating": "4.7",
"availability": "In Stock"
},
"credits_used": 2,
"credits_remaining": 998,
"processing_time": 320
}
Field Descriptions
data.title从 h1.product-title 选择器提取
data.price从 .price-value 选择器提取
data.description从 .product-description 选择器提取
credits_used本次请求扣除的 credits(每次抓取 2 个)

credit 费用

2 credits
每次请求 2 credits
每次结构化抓取花费 2 credits,与选择器数量无关。

提示: 抓取多个结构相同的页面时,使用 batch_scrape 可获得更高效率。

相关工具

batch_scrape
并发抓取多个 URL(5 credits)
structured_extract
无需选择器的 AI 辅助提取(3 credits)
准备好提取结构化数据了吗?免费注册,获取 1,000 credits。