AI / MCP

Token

Definition

A token is the basic unit of text that language models process. Text is split into tokens (roughly 4 characters or 0.75 words each) before being processed by the model. Token counts determine costs and context limits.

How It Relates to CrawlForge

Understanding tokens is important when using CrawlForge with AI agents because the scraped content consumes context window space. A long web page might produce thousands of tokens, potentially filling the agent's context and increasing API costs.

CrawlForge tools like extract_text and summarize_content help manage token usage. extract_text returns only the main content without boilerplate, and summarize_content condenses long pages into concise summaries, reducing the token footprint sent to your LLM.

Related CrawlForge Tools

Related Terms

Context Window

The context window is the maximum amount of text (measured in tokens) that a language model can process in a single request. It includes both the input prompt and the generated output.

Large Language Model (LLM)

A large language model is a neural network trained on vast amounts of text data that can understand and generate human language. LLMs power AI assistants, code generators, and autonomous agents.

Prompt Engineering

Prompt engineering is the practice of designing and refining instructions given to language models to achieve desired outputs. It involves crafting system prompts, few-shot examples, and structured queries.

Fine-Tuning

Fine-tuning is the process of further training a pre-trained language model on a specific dataset to specialize its behavior for a particular task or domain. It adapts general-purpose models to targeted use cases.

Start Scraping with 1,000 Free Credits

Get started with CrawlForge today. No credit card required.

Start scraping with 1,000 free credits