使用场景
电商商品抓取
从在线商店提取商品标题、价格、描述和图片
新闻文章提取
从新闻站点提取标题、作者、日期和正文
自定义数据转换
将任意 HTML 结构映射到你期望的 JSON schema
房产房源
从房源站点提取房产详情、价格和图片
Endpoint
POST
/api/v1/tools/scrape_structuredAuth Required
Free 计划 2 req/s
2 credits
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Required | - | 要抓取的 URL Example: https://example.com/product |
selectors | object | Required | - | 将字段名映射到选择器的 CSS 选择器 Example: {"title": "h1.product-title", "price": ".price", "description": ".product-desc"} |
CSS 选择器:
可使用任何有效的 CSS 选择器语法。常见写法:
.className- 按 class 选择#id- 按 ID 选择tag.class- 组合标签和 class.parent > .child- 直接子元素[data-id="value"]- 属性选择器
请求示例
cURL - 电商商品
terminalBash
curl -X POST https://crawlforge.dev/api/v1/tools/scrape_structured \
-H "X-API-Key: cf_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/123",
"selectors": {
"title": "h1.product-title",
"price": ".price-value",
"currency": ".price-currency",
"description": ".product-description",
"image": "img.main-image",
"rating": ".rating-value",
"availability": ".stock-status"
}
}'TypeScript - 新闻文章
scrapeStructured.tsTypescript
const response = await fetch('https://crawlforge.dev/api/v1/tools/scrape_structured', {
method: 'POST',
headers: {
'X-API-Key': process.env.CRAWLFORGE_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://example.com/news/article-123',
selectors: {
headline: 'h1.article-title',
author: '.author-name',
publishDate: 'time.publish-date',
category: '.category-tag',
content: '.article-body',
image: '.article-image img'
}
}),
});
const data = await response.json();
if (data.success) {
const article = data.data;
console.log(`Article: ${article.headline}`);
console.log(`By: ${article.author}`);
console.log(`Published: ${article.publishDate}`);
}Python - 房产房源
scrape_structured.pyPython
import requests
import os
response = requests.post(
'https://crawlforge.dev/api/v1/tools/scrape_structured',
headers={
'X-API-Key': os.environ['CRAWLFORGE_API_KEY'],
'Content-Type': 'application/json',
},
json={
'url': 'https://example.com/property/456',
'selectors': {
'address': '.property-address',
'price': '.listing-price',
'bedrooms': '.bed-count',
'bathrooms': '.bath-count',
'sqft': '.square-feet',
'description': '.property-description',
'images': '.gallery img'
}
}
)
data = response.json()
if data['success']:
property_data = data['data']
print(f"Property: {property_data['address']}")
print(f"Price: {property_data['price']}")
print(f"Beds: {property_data['bedrooms']}")
print(f"Baths: {property_data['bathrooms']}")响应示例
200 OK320ms
{ "success": true, "data": { "title": "Premium Wireless Headphones", "price": "299.99", "currency": "USD", "description": "High-quality wireless headphones with active noise cancellation and 30-hour battery life.", "image": "https://example.com/images/headphones.jpg", "rating": "4.7", "availability": "In Stock" }, "credits_used": 2, "credits_remaining": 998, "processing_time": 320}Field Descriptions
data.title从 h1.product-title 选择器提取data.price从 .price-value 选择器提取data.description从 .product-description 选择器提取credits_used本次请求扣除的 credits(每次抓取 2 个)credit 费用
2 credits
每次请求 2 credits
每次结构化抓取花费 2 credits,与选择器数量无关。
提示: 抓取多个结构相同的页面时,使用 batch_scrape 可获得更高效率。
相关工具
准备好提取结构化数据了吗?免费注册,获取 1,000 credits。