CrawlForge
HomeUse CasesIntegrationsPricingDocumentationBlog
  1. Home
  2. /
  3. Glossary
  4. /
  5. Robots.txt

Robots.txt

Web Scraping

Definition

Robots.txt is a standard text file placed at the root of a website that tells web crawlers which pages they are allowed or disallowed from accessing. It is part of the Robots Exclusion Protocol.

How It Relates to CrawlForge

The robots.txt file acts as a set of guidelines for crawlers. While not legally binding, respecting it is considered standard practice for ethical scraping. It specifies which paths are off-limits, crawl delay preferences, and links to XML sitemaps.

CrawlForge tools respect robots.txt directives by default. When using crawl_deep or map_site, the crawler checks robots.txt before accessing pages, ensuring your scraping activity stays within the site owner's stated preferences.

Related CrawlForge Tools

crawl_deep
5 credits
map_site
3 credits

Related Terms

Web Crawler

A web crawler is a program that systematically browses the web by following links from page to page. Crawlers discover and index content across entire websites or domains.

Sitemap

A sitemap is an XML file that lists all the URLs on a website, along with metadata like last modification date and priority. It helps search engines and crawlers discover and index all pages efficiently.

Rate Limiting

Rate limiting is a technique used by websites and APIs to control the number of requests a client can make within a given time period. It prevents server overload and defends against abusive scraping.

User Agent

A user agent is a string sent in HTTP request headers that identifies the client software making the request. Websites use it to detect browsers, bots, and scrapers.

Start Scraping with 1,000 Free Credits

Get started with CrawlForge today. No credit card required.

Start scraping with 1,000 free credits

Footer

CrawlForge

Enterprise web scraping for AI Agents. 18 specialized MCP tools designed for modern developers building intelligent systems.

Product

  • Features
  • Pricing
  • Use Cases
  • Integrations
  • Changelog

Resources

  • Getting Started
  • API Reference
  • Templates
  • Guides
  • Blog
  • FAQ

Developers

  • MCP Protocol
  • Claude Desktop
  • Cursor IDE
  • LangChain
  • LlamaIndex

Company

  • About
  • Contact
  • Privacy
  • Terms

Stay updated

Get the latest updates on new tools and features.

Built with Next.js and MCP protocol

© 2025-2026 CrawlForge. All rights reserved.