Developer Guide

llms.txt Explained: The New Standard for AI Crawlers (2026)

By Serpent API Team · · 7 min read

If you have looked at AEO best practices in 2026 you have probably seen the term llms.txt and wondered whether it actually does anything. Short answer: yes. llms.txt is a small Markdown file at the root of your site that gives AI assistants a curated, machine-readable map of your most important content. It complements robots.txt and sitemap.xml, takes ten minutes to set up, and pays off in higher citation rates inside ChatGPT, Gemini, Claude, and Perplexity.

This guide explains what llms.txt is, the exact syntax, three production-ready examples, how to validate it, and how to monitor AI crawler activity once it is live. If you are also working on AI Overview optimisation, see our companion AEO vs SEO playbook for the broader strategy.

What is llms.txt?

llms.txt is a plain Markdown file placed at https://yourdomain.com/llms.txt. The format was proposed in late 2024 to solve a specific problem: AI assistants need a fast way to understand which pages on a site are most useful for their summaries and citations, without crawling the entire site. The file lists key pages with one-line descriptions in a structured format that any LLM can parse directly.

It is the AI-era equivalent of a hand-curated sitemap. A sitemap.xml says "here is every URL we have". llms.txt says "here are the URLs we want AI assistants to use as authoritative sources, ranked by importance".

Why llms.txt Exists

Three problems made the file necessary:

  1. Crawl budget for AI assistants. ChatGPT, Gemini and Perplexity cannot crawl the open web in real time for every query. They cache and summarise. A curated entry point like llms.txt makes them faster and more accurate.
  2. Disambiguation. A site may have ten pages that mention "pricing" but only one canonical pricing page. Sitemap.xml weights them equally; llms.txt highlights the canonical page.
  3. Signal-to-noise ratio. Cookie banners, navigation chrome, footer boilerplate are useless to a summariser. llms.txt points directly at the substantive content.

The file is not a replacement for any existing standard — it is a positive signal that sits alongside robots.txt (which gates access) and sitemap.xml (which lists everything).

The llms.txt Syntax

llms.txt is Markdown with three required sections:

# Site Name

> Short, single-paragraph description of what the site offers.

## Core pages

- [Page title](https://example.com/page) : optional one-line description
- [Another page](https://example.com/another)

## Optional

- [Less critical page](https://example.com/optional)

Three rules:

That is the entire spec. There is also an extension called llms-full.txt for sites that want to include the full Markdown body of every linked page in one file — useful for LLMs that want to ingest your knowledge base in a single fetch.

Three Real Examples (Small, Medium, Large)

Example 1: A SaaS landing site (small)

# Acme Email Verifier

> Acme is a developer API that verifies email addresses in 80ms with 99.4% accuracy. Pay-as-you-go pricing from $0.0008/check.

## Core pages

- [Pricing](https://acmeverify.com/pricing) : Per-check pricing tiers and free trial details
- [Documentation](https://acmeverify.com/docs) : REST API reference with curl, Python, Node examples
- [Sign up](https://acmeverify.com/signup) : Free 1,000 verifications, no credit card required

## Optional

- [About](https://acmeverify.com/about)
- [Privacy](https://acmeverify.com/privacy)

Example 2: A developer documentation site (medium)

# Serpent API

> Serpent API is the cheapest Google SERP API in the world. Web search across Google, Bing, Yahoo, and DuckDuckGo from $0.03 per 10,000 pages, plus AI ranking across ChatGPT, Gemini, Claude, and Perplexity.

## Core pages

- [Homepage](https://apiserpent.com/) : Product overview and pricing snapshot
- [Pricing](https://apiserpent.com/pricing.html) : Full per-endpoint pricing across Default, Growth, Scale tiers
- [Documentation](https://apiserpent.com/docs.html) : Complete REST API reference
- [Playground](https://apiserpent.com/playground.html) : Live API tester for SERP, AI ranking, and social

## Product pages

- [Google SERP API](https://apiserpent.com/google-serp-api.html)
- [AI Ranking API](https://apiserpent.com/ai-rank-api.html)
- [News API](https://apiserpent.com/news-api.html)
- [Image Search API](https://apiserpent.com/image-search-api.html)

## Blog

- [The Cheapest SERP API in 2026](https://apiserpent.com/blog/cheapest-serp-api-comparison.html)
- [AEO vs SEO 2026 Playbook](https://apiserpent.com/blog/aeo-vs-seo-2026.html)
- [Build a Citation Tracker in Python](https://apiserpent.com/blog/ai-citation-tracker-python.html)

## Optional

- [About](https://apiserpent.com/about.html)
- [Refund Policy](https://apiserpent.com/refund.html)

Example 3: A large content site (60+ key pages)

For larger sites, group links by topic and keep the total file under 200 lines. Anything beyond that should live in llms-full.txt for assistants that want the long form.

# Example Knowledge Base

> Production-ready guides on web infrastructure, search APIs, and AI integration.

## Getting started

- [Quickstart](https://example.com/quickstart)
- [API authentication](https://example.com/auth)

## Search APIs

- [SERP API overview](https://example.com/serp)
- [Google AI Overview extraction](https://example.com/ai-overview)
- [Multi-engine aggregator](https://example.com/aggregator)

## AI integrations

- [LangChain integration](https://example.com/langchain)
- [OpenAI function calling](https://example.com/openai)

## Blog

- [Top 10 SERP API trends 2026](https://example.com/blog/trends)
- [Programmatic SEO with SERP data](https://example.com/blog/pseo)

## Optional

- [Changelog](https://example.com/changelog)
- [Status page](https://example.com/status)

llms.txt vs robots.txt vs sitemap.xml

FilePurposeAudienceFormat
robots.txtAllow / disallow crawling per user-agentAll crawlersText directives
sitemap.xmlComprehensive list of all canonical URLsSearch engine crawlersXML
llms.txtCurated, ranked entry points for AI summarisationLLM-powered assistantsMarkdown

You should have all three. They serve different purposes and never conflict.

Once your llms.txt is live, monitor whether AI engines are actually citing your pages. The Serpent AI Ranking API queries ChatGPT, Gemini, Claude, and Perplexity for any keyword and returns the citation URLs. Get 10 free queries to test →

Setup in 10 Minutes

  1. Pick your top 5 to 15 pages. Homepage, pricing, docs, key product pages, top blog posts. Resist the urge to list everything.
  2. Write a one-paragraph site description. 30 to 60 words. State what the site is, who it serves, and the price or unique value if relevant. AI engines will quote this verbatim.
  3. Compose the file using the syntax above. Save as llms.txt (no extension other than .txt).
  4. Upload to your site root. The file must be reachable at https://yourdomain.com/llms.txt — not in a subdirectory.
  5. Add to your CDN's whitelist. Make sure the file is not behind a captcha, paywall, or geo-block.
  6. Add a self-link from robots.txt. Add a comment line: # LLM directory: https://yourdomain.com/llms.txt — not a directive, just a discovery aid.
  7. Submit to llms.txt directories like the official directory.txt repositories that aggregate adopters.
  8. Verify with curl. curl -I https://yourdomain.com/llms.txt should return 200 OK and Content-Type: text/markdown or text/plain.

Validation & Testing

There is no official W3C-style validator yet, but a few checks catch 95% of issues:

Monitoring AI Crawler Activity

Once your llms.txt is live, monitor whether AI crawlers are actually fetching it and whether your pages are showing up in AI answers. Two signals to track:

Server log analysis

Filter your access logs for the major AI user-agents:

Track the request count to /llms.txt per user-agent over time. A growing curve means AI engines are paying attention.

Citation rate inside AI answers

The ultimate proof that llms.txt is working is increased citation frequency in ChatGPT, Gemini, Claude, and Perplexity for your target keywords. Track citation rate weekly using the Serpent AI Ranking API and note the slope before and after llms.txt deployment. For a step-by-step tracker implementation see our AI Citation Tracker tutorial.

FAQ

What is llms.txt?

A Markdown file at the root of your domain that gives AI assistants a curated, machine-readable map of your most important pages.

Is llms.txt the same as robots.txt?

No. robots.txt gates access; llms.txt curates importance. They complement each other.

Do AI engines actually use llms.txt?

Adoption is growing in 2026. Several major AI assistants check for the file during crawl, and using it correlates with higher AEO citation rates. It is free to add and has no SEO downside.

Does llms.txt block AI training?

No, llms.txt is opt-in, not opt-out. To block AI training crawlers use robots.txt with directives for GPTBot, ClaudeBot, Google-Extended, and OAI-SearchBot.

Where do I put llms.txt?

The site root: https://yourdomain.com/llms.txt. Optionally also add llms-full.txt for the long-form version.

Track Whether Your llms.txt is Working

Serpent API gives you weekly citation reports across ChatGPT, Gemini, Claude, and Perplexity for any keyword. Watch your citation count grow as AEO improvements (including llms.txt) take effect. From $0.03 per 10,000 Google SERP pages, 10 free searches included.

Get Your Free API Key

Explore: AI Ranking API · SERP API · Pricing · Try in Playground