Dify is an open-source LLM app development platform that lets you build AI applications with a visual workflow editor. By adding CrawlForge as a custom tool, your Dify workflows gain the ability to scrape websites, search the web, and extract structured data -- all without writing code.

This guide covers both the no-code approach (Dify's visual tool configuration) and the API-based approach for advanced integrations.

What Is Dify?
Prerequisites
Step 1: Set Up a Custom Tool Provider
Step 2: Define CrawlForge Tool Schemas
Step 3: Build a Web Research Workflow
Step 4: Build a Content Extraction Pipeline
Step 5: Handle Authentication and Errors
Credit Cost Reference
CrawlForge Tools Available in Dify
Next Steps

What Is Dify?

Dify is a production-ready platform for building LLM applications. It provides a visual workflow builder, agent orchestration, RAG pipeline management, and a library of 50+ built-in tools. Dify supports custom tool integration through OpenAPI specifications, which means any REST API -- including CrawlForge -- can be added as a tool.

Dify's native MCP integration also means you can connect CrawlForge as an MCP server directly. This guide covers both approaches.

Prerequisites

Dify instance -- either Dify Cloud or self-hosted via Docker
A CrawlForge account with an API key (1,000 free credits)
Admin access to your Dify workspace

Step 1: Set Up a Custom Tool Provider

In your Dify dashboard, navigate to Tools > Custom Tools > Create Custom Tool.

Paste the following OpenAPI specification to register CrawlForge's core tools:

Yaml

openapi: "3.0.0"
info:
  title: CrawlForge Web Scraping Tools
  version: "1.0.0"
  description: "26 specialized web scraping tools for AI applications"
servers:
  - url: https://crawlforge.dev/api/v1/tools
paths:
  /extract_content:
    post:
      operationId: extractContent
      summary: Extract clean content from a URL (2 credits)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [url]
              properties:
                url:
                  type: string
                  description: The URL to extract content from
      responses:
        "200":
          description: Extracted content
  /search_web:
    post:
      operationId: searchWeb
      summary: Search the web via Google (5 credits)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [query]
              properties:
                query:
                  type: string
                  description: Search query
                limit:
                  type: integer
                  description: Max results (default 10)
      responses:
        "200":
          description: Search results
  /fetch_url:
    post:
      operationId: fetchUrl
      summary: Fetch raw page content (1 credit)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [url]
              properties:
                url:
                  type: string
                  description: The URL to fetch
      responses:
        "200":
          description: Raw page content
  /scrape_structured:
    post:
      operationId: scrapeStructured
      summary: Extract data with CSS selectors (2 credits)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [url, selectors]
              properties:
                url:
                  type: string
                selectors:
                  type: object
                  additionalProperties:
                    type: string
      responses:
        "200":
          description: Structured extraction results

Set the authentication to Bearer Token and enter your CrawlForge API key (cf_live_...).

Step 2: Define CrawlForge Tool Schemas

After importing the OpenAPI spec, Dify automatically generates tool cards for each endpoint. Configure each tool with descriptive names so the LLM agent can select them correctly:

Dify Tool Name	CrawlForge Endpoint	Credits	When the Agent Should Use It
Fetch Web Page	/fetch_url	1	User provides a specific URL to read
Extract Content	/extract_content	2	Need clean, readable text from a page
Search the Web	/search_web	5	Need to find pages on a topic
Extract Structured Data	/scrape_structured	2	Need specific data points via CSS selectors

For each tool in Dify, add a clear description that includes the credit cost. This helps the LLM agent make cost-efficient decisions.

Step 3: Build a Web Research Workflow

In Dify's workflow editor, create a new workflow with these nodes:

Typescript

// Pseudocode for the Dify workflow (implemented visually in Dify's editor)

// Node 1: Start -- User provides a research topic
// Input: { topic: string }

// Node 2: Search Web (5 credits)
// Tool: CrawlForge search_web
// Input: { query: "{{topic}} latest developments 2026", limit: 5 }
// Output: search_results

// Node 3: Extract Top Results (2 credits each)
// Tool: CrawlForge extract_content
// Loop over: search_results.results[0..2]
// Input: { url: "{{item.link}}" }
// Output: extracted_pages[]

// Node 4: LLM Synthesis
// Model: Claude Sonnet
// Prompt: "Synthesize these sources into a research brief: {{extracted_pages}}"
// Output: research_summary

// Node 5: End -- Return research_summary to user
// Total credits: 5 + (3 * 2) = 11 credits per run

The visual workflow in Dify makes this a drag-and-drop operation. Each node connects to the next, with data flowing through template variables.

Step 4: Build a Content Extraction Pipeline

For recurring data extraction tasks, build a pipeline workflow:

Typescript

// Dify workflow for daily competitor monitoring

// Node 1: Start (triggered by schedule or API call)
// Input: { urls: ["https://competitor1.com/pricing", "https://competitor2.com/pricing"] }

// Node 2: Batch Extract (2 credits per URL)
// Tool: CrawlForge scrape_structured
// Loop over: urls
// Input: {
//   url: "{{item}}",
//   selectors: {
//     plans: ".pricing-plan h3",
//     prices: ".pricing-plan .price",
//     features: ".pricing-plan .feature-list"
//   }
// }
// Output: pricing_data[]

// Node 3: LLM Analysis
// Model: Claude Haiku (for cost efficiency)
// Prompt: "Compare these pricing pages and highlight any changes: {{pricing_data}}"
// Output: analysis

// Node 4: Conditional -- if changes detected, send notification
// Node 5: End
// Total: 2 * number_of_urls credits per run

Step 5: Handle Authentication and Errors

Authentication

CrawlForge uses Bearer token authentication. In Dify, set this once at the custom tool provider level:

Go to Tools > Custom Tools > CrawlForge
Click Configure Authorization
Select API Key (Bearer)
Enter your CrawlForge API key

All tool calls within workflows automatically include the auth header.

Error Handling

Add error handling nodes in your Dify workflow for common scenarios:

Typescript

// Error handling pattern for Dify workflows

// After each CrawlForge tool node, add a conditional:
// If response.status === 402 -> "Insufficient credits"
//   -> Notify user to top up at crawlforge.dev/pricing
// If response.status === 429 -> "Rate limited"
//   -> Wait 2 seconds, retry the node
// If response.status === 500 -> "Server error"
//   -> Log error, skip this URL, continue workflow

Dify's built-in retry mechanism handles transient failures automatically. For credit exhaustion errors (HTTP 402), route to a notification node that alerts the user.

Credit Cost Reference

Credits	Tools	Dify Workflow Use Case
1	fetch_url, extract_text, extract_links, extract_metadata	Simple page fetching triggers
2	scrape_structured, extract_content, map_site, process_document, localization	Extraction pipeline nodes, site audit workflows
3	track_changes, analyze_content	Change detection, content analysis
4	summarize_content, crawl_deep	Summary generation, multi-page crawling
5	search_web, batch_scrape, scrape_with_actions, stealth_mode	Research and bulk workflows
10	deep_research	Comprehensive analysis workflows

CrawlForge Tools Available in Dify

All 26 CrawlForge tools can be registered in Dify. The most commonly used in visual workflows are:

Tool	Credits	Why It Works Well in Dify
search_web	5	Natural starting point for research workflows
extract_content	2	Clean output feeds directly into LLM nodes
scrape_structured	2	CSS selectors return predictable, structured JSON
fetch_url	1	Cheapest option for simple page access
batch_scrape	5	Handles loops more efficiently than individual calls

Next Steps

Dify Documentation -- official Dify platform docs
CrawlForge API Reference -- endpoint schemas for all 26 tools
Complete MCP Guide -- understanding MCP protocol integration
CrawlForge Pricing -- credit packs starting at $19/month

Add web scraping to your Dify apps today. Get your free API key with 1,000 credits and register CrawlForge as a custom tool in Dify. No code required.

This guide covers both the no-code approach (Dify's visual tool configuration) and the API-based approach for advanced integrations.

What Is Dify?
Prerequisites
Step 1: Set Up a Custom Tool Provider
Step 2: Define CrawlForge Tool Schemas
Step 3: Build a Web Research Workflow
Step 4: Build a Content Extraction Pipeline
Step 5: Handle Authentication and Errors
Credit Cost Reference
CrawlForge Tools Available in Dify
Next Steps

What Is Dify?

Dify's native MCP integration also means you can connect CrawlForge as an MCP server directly. This guide covers both approaches.

Prerequisites

Dify instance -- either Dify Cloud or self-hosted via Docker
A CrawlForge account with an API key (1,000 free credits)
Admin access to your Dify workspace

Step 1: Set Up a Custom Tool Provider

In your Dify dashboard, navigate to Tools > Custom Tools > Create Custom Tool.

Paste the following OpenAPI specification to register CrawlForge's core tools:

Yaml

openapi: "3.0.0"
info:
  title: CrawlForge Web Scraping Tools
  version: "1.0.0"
  description: "26 specialized web scraping tools for AI applications"
servers:
  - url: https://crawlforge.dev/api/v1/tools
paths:
  /extract_content:
    post:
      operationId: extractContent
      summary: Extract clean content from a URL (2 credits)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [url]
              properties:
                url:
                  type: string
                  description: The URL to extract content from
      responses:
        "200":
          description: Extracted content
  /search_web:
    post:
      operationId: searchWeb
      summary: Search the web via Google (5 credits)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [query]
              properties:
                query:
                  type: string
                  description: Search query
                limit:
                  type: integer
                  description: Max results (default 10)
      responses:
        "200":
          description: Search results
  /fetch_url:
    post:
      operationId: fetchUrl
      summary: Fetch raw page content (1 credit)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [url]
              properties:
                url:
                  type: string
                  description: The URL to fetch
      responses:
        "200":
          description: Raw page content
  /scrape_structured:
    post:
      operationId: scrapeStructured
      summary: Extract data with CSS selectors (2 credits)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [url, selectors]
              properties:
                url:
                  type: string
                selectors:
                  type: object
                  additionalProperties:
                    type: string
      responses:
        "200":
          description: Structured extraction results

Set the authentication to Bearer Token and enter your CrawlForge API key (cf_live_...).

Step 2: Define CrawlForge Tool Schemas

After importing the OpenAPI spec, Dify automatically generates tool cards for each endpoint. Configure each tool with descriptive names so the LLM agent can select them correctly:

Dify Tool Name	CrawlForge Endpoint	Credits	When the Agent Should Use It
Fetch Web Page	/fetch_url	1	User provides a specific URL to read
Extract Content	/extract_content	2	Need clean, readable text from a page
Search the Web	/search_web	5	Need to find pages on a topic
Extract Structured Data	/scrape_structured	2	Need specific data points via CSS selectors

For each tool in Dify, add a clear description that includes the credit cost. This helps the LLM agent make cost-efficient decisions.

Step 3: Build a Web Research Workflow

In Dify's workflow editor, create a new workflow with these nodes:

Typescript

// Pseudocode for the Dify workflow (implemented visually in Dify's editor)

// Node 1: Start -- User provides a research topic
// Input: { topic: string }

// Node 2: Search Web (5 credits)
// Tool: CrawlForge search_web
// Input: { query: "{{topic}} latest developments 2026", limit: 5 }
// Output: search_results

// Node 3: Extract Top Results (2 credits each)
// Tool: CrawlForge extract_content
// Loop over: search_results.results[0..2]
// Input: { url: "{{item.link}}" }
// Output: extracted_pages[]

// Node 4: LLM Synthesis
// Model: Claude Sonnet
// Prompt: "Synthesize these sources into a research brief: {{extracted_pages}}"
// Output: research_summary

// Node 5: End -- Return research_summary to user
// Total credits: 5 + (3 * 2) = 11 credits per run

The visual workflow in Dify makes this a drag-and-drop operation. Each node connects to the next, with data flowing through template variables.

Step 4: Build a Content Extraction Pipeline

For recurring data extraction tasks, build a pipeline workflow:

Typescript

// Dify workflow for daily competitor monitoring

// Node 1: Start (triggered by schedule or API call)
// Input: { urls: ["https://competitor1.com/pricing", "https://competitor2.com/pricing"] }

// Node 2: Batch Extract (2 credits per URL)
// Tool: CrawlForge scrape_structured
// Loop over: urls
// Input: {
//   url: "{{item}}",
//   selectors: {
//     plans: ".pricing-plan h3",
//     prices: ".pricing-plan .price",
//     features: ".pricing-plan .feature-list"
//   }
// }
// Output: pricing_data[]

// Node 3: LLM Analysis
// Model: Claude Haiku (for cost efficiency)
// Prompt: "Compare these pricing pages and highlight any changes: {{pricing_data}}"
// Output: analysis

// Node 4: Conditional -- if changes detected, send notification
// Node 5: End
// Total: 2 * number_of_urls credits per run

Step 5: Handle Authentication and Errors

Authentication

CrawlForge uses Bearer token authentication. In Dify, set this once at the custom tool provider level:

Go to Tools > Custom Tools > CrawlForge
Click Configure Authorization
Select API Key (Bearer)
Enter your CrawlForge API key

All tool calls within workflows automatically include the auth header.

Error Handling

Add error handling nodes in your Dify workflow for common scenarios:

Typescript

// Error handling pattern for Dify workflows

// After each CrawlForge tool node, add a conditional:
// If response.status === 402 -> "Insufficient credits"
//   -> Notify user to top up at crawlforge.dev/pricing
// If response.status === 429 -> "Rate limited"
//   -> Wait 2 seconds, retry the node
// If response.status === 500 -> "Server error"
//   -> Log error, skip this URL, continue workflow

Dify's built-in retry mechanism handles transient failures automatically. For credit exhaustion errors (HTTP 402), route to a notification node that alerts the user.

Credit Cost Reference

Credits	Tools	Dify Workflow Use Case
1	fetch_url, extract_text, extract_links, extract_metadata	Simple page fetching triggers
2	scrape_structured, extract_content, map_site, process_document, localization	Extraction pipeline nodes, site audit workflows
3	track_changes, analyze_content	Change detection, content analysis
4	summarize_content, crawl_deep	Summary generation, multi-page crawling
5	search_web, batch_scrape, scrape_with_actions, stealth_mode	Research and bulk workflows
10	deep_research	Comprehensive analysis workflows

CrawlForge Tools Available in Dify

All 26 CrawlForge tools can be registered in Dify. The most commonly used in visual workflows are:

Tool	Credits	Why It Works Well in Dify
search_web	5	Natural starting point for research workflows
extract_content	2	Clean output feeds directly into LLM nodes
scrape_structured	2	CSS selectors return predictable, structured JSON
fetch_url	1	Cheapest option for simple page access
batch_scrape	5	Handles loops more efficiently than individual calls

Next Steps

Dify Documentation -- official Dify platform docs
CrawlForge API Reference -- endpoint schemas for all 26 tools
Complete MCP Guide -- understanding MCP protocol integration
CrawlForge Pricing -- credit packs starting at $19/month

Add web scraping to your Dify apps today. Get your free API key with 1,000 credits and register CrawlForge as a custom tool in Dify. No code required.

On this page

Table of Contents

What Is Dify?

Prerequisites

Step 1: Set Up a Custom Tool Provider

Step 2: Define CrawlForge Tool Schemas

Step 3: Build a Web Research Workflow

Step 4: Build a Content Extraction Pipeline

Step 5: Handle Authentication and Errors

Authentication

Error Handling

Credit Cost Reference

CrawlForge Tools Available in Dify

Next Steps

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Related Articles

How to Use CrawlForge with Make and Zapier

How to Use CrawlForge with n8n: Workflow Automation Guide

How to Use CrawlForge with LangGraph Agents

On this page

Table of Contents

What Is Dify?

Prerequisites

Step 1: Set Up a Custom Tool Provider

Step 2: Define CrawlForge Tool Schemas

Step 3: Build a Web Research Workflow

Step 4: Build a Content Extraction Pipeline

Step 5: Handle Authentication and Errors

Authentication

Error Handling

Credit Cost Reference

CrawlForge Tools Available in Dify

Next Steps

Try this yourself — no signup needed

Tags

About the Author

CrawlForge Team

Stay updated with the latest insights

Related Articles

How to Use CrawlForge with Make and Zapier

How to Use CrawlForge with n8n: Workflow Automation Guide

How to Use CrawlForge with LangGraph Agents