llms.txt Explained: The New Standard for AI Crawlers (2026)
If you have looked at AEO best practices in 2026 you have probably seen the term llms.txt and wondered whether it actually does anything. Short answer: yes. llms.txt is a small Markdown file at the root of your site that gives AI assistants a curated, machine-readable map of your most important content. It complements robots.txt and sitemap.xml, takes ten minutes to set up, and pays off in higher citation rates inside ChatGPT, Gemini, Claude, and Perplexity.
This guide explains what llms.txt is, the exact syntax, three production-ready examples, how to validate it, and how to monitor AI crawler activity once it is live. If you are also working on AI Overview optimisation, see our companion AEO vs SEO playbook for the broader strategy.
What is llms.txt?
llms.txt is a plain Markdown file placed at https://yourdomain.com/llms.txt. The format was proposed in late 2024 to solve a specific problem: AI assistants need a fast way to understand which pages on a site are most useful for their summaries and citations, without crawling the entire site. The file lists key pages with one-line descriptions in a structured format that any LLM can parse directly.
It is the AI-era equivalent of a hand-curated sitemap. A sitemap.xml says "here is every URL we have". llms.txt says "here are the URLs we want AI assistants to use as authoritative sources, ranked by importance".
Why llms.txt Exists
Three problems made the file necessary:
- Crawl budget for AI assistants. ChatGPT, Gemini and Perplexity cannot crawl the open web in real time for every query. They cache and summarise. A curated entry point like llms.txt makes them faster and more accurate.
- Disambiguation. A site may have ten pages that mention "pricing" but only one canonical pricing page. Sitemap.xml weights them equally; llms.txt highlights the canonical page.
- Signal-to-noise ratio. Cookie banners, navigation chrome, footer boilerplate are useless to a summariser. llms.txt points directly at the substantive content.
The file is not a replacement for any existing standard — it is a positive signal that sits alongside robots.txt (which gates access) and sitemap.xml (which lists everything).
The llms.txt Syntax
llms.txt is Markdown with three required sections:
# Site Name
> Short, single-paragraph description of what the site offers.
## Core pages
- [Page title](https://example.com/page) : optional one-line description
- [Another page](https://example.com/another)
## Optional
- [Less critical page](https://example.com/optional)
Three rules:
- The H1 is your site name.
- The blockquote (
> ...) is a one-paragraph site description that AI assistants will use verbatim when summarising your brand. - H2 sections group links. Common conventions:
## Core pages,## Documentation,## Blog,## Optional.
That is the entire spec. There is also an extension called llms-full.txt for sites that want to include the full Markdown body of every linked page in one file — useful for LLMs that want to ingest your knowledge base in a single fetch.
Three Real Examples (Small, Medium, Large)
Example 1: A SaaS landing site (small)
# Acme Email Verifier
> Acme is a developer API that verifies email addresses in 80ms with 99.4% accuracy. Pay-as-you-go pricing from $0.0008/check.
## Core pages
- [Pricing](https://acmeverify.com/pricing) : Per-check pricing tiers and free trial details
- [Documentation](https://acmeverify.com/docs) : REST API reference with curl, Python, Node examples
- [Sign up](https://acmeverify.com/signup) : Free 1,000 verifications, no credit card required
## Optional
- [About](https://acmeverify.com/about)
- [Privacy](https://acmeverify.com/privacy)
Example 2: A developer documentation site (medium)
# Serpent API
> Serpent API is the cheapest Google SERP API in the world. Web search across Google, Bing, Yahoo, and DuckDuckGo from $0.03 per 10,000 pages, plus AI ranking across ChatGPT, Gemini, Claude, and Perplexity.
## Core pages
- [Homepage](https://apiserpent.com/) : Product overview and pricing snapshot
- [Pricing](https://apiserpent.com/pricing.html) : Full per-endpoint pricing across Default, Growth, Scale tiers
- [Documentation](https://apiserpent.com/docs.html) : Complete REST API reference
- [Playground](https://apiserpent.com/playground.html) : Live API tester for SERP, AI ranking, and social
## Product pages
- [Google SERP API](https://apiserpent.com/google-serp-api.html)
- [AI Ranking API](https://apiserpent.com/ai-rank-api.html)
- [News API](https://apiserpent.com/news-api.html)
- [Image Search API](https://apiserpent.com/image-search-api.html)
## Blog
- [The Cheapest SERP API in 2026](https://apiserpent.com/blog/cheapest-serp-api-comparison.html)
- [AEO vs SEO 2026 Playbook](https://apiserpent.com/blog/aeo-vs-seo-2026.html)
- [Build a Citation Tracker in Python](https://apiserpent.com/blog/ai-citation-tracker-python.html)
## Optional
- [About](https://apiserpent.com/about.html)
- [Refund Policy](https://apiserpent.com/refund.html)
Example 3: A large content site (60+ key pages)
For larger sites, group links by topic and keep the total file under 200 lines. Anything beyond that should live in llms-full.txt for assistants that want the long form.
# Example Knowledge Base
> Production-ready guides on web infrastructure, search APIs, and AI integration.
## Getting started
- [Quickstart](https://example.com/quickstart)
- [API authentication](https://example.com/auth)
## Search APIs
- [SERP API overview](https://example.com/serp)
- [Google AI Overview extraction](https://example.com/ai-overview)
- [Multi-engine aggregator](https://example.com/aggregator)
## AI integrations
- [LangChain integration](https://example.com/langchain)
- [OpenAI function calling](https://example.com/openai)
## Blog
- [Top 10 SERP API trends 2026](https://example.com/blog/trends)
- [Programmatic SEO with SERP data](https://example.com/blog/pseo)
## Optional
- [Changelog](https://example.com/changelog)
- [Status page](https://example.com/status)
llms.txt vs robots.txt vs sitemap.xml
| File | Purpose | Audience | Format |
|---|---|---|---|
robots.txt | Allow / disallow crawling per user-agent | All crawlers | Text directives |
sitemap.xml | Comprehensive list of all canonical URLs | Search engine crawlers | XML |
llms.txt | Curated, ranked entry points for AI summarisation | LLM-powered assistants | Markdown |
You should have all three. They serve different purposes and never conflict.
Once your llms.txt is live, monitor whether AI engines are actually citing your pages. The Serpent AI Ranking API queries ChatGPT, Gemini, Claude, and Perplexity for any keyword and returns the citation URLs. Get 10 free queries to test →
Setup in 10 Minutes
- Pick your top 5 to 15 pages. Homepage, pricing, docs, key product pages, top blog posts. Resist the urge to list everything.
- Write a one-paragraph site description. 30 to 60 words. State what the site is, who it serves, and the price or unique value if relevant. AI engines will quote this verbatim.
- Compose the file using the syntax above. Save as
llms.txt(no extension other than .txt). - Upload to your site root. The file must be reachable at
https://yourdomain.com/llms.txt— not in a subdirectory. - Add to your CDN's whitelist. Make sure the file is not behind a captcha, paywall, or geo-block.
- Add a self-link from robots.txt. Add a comment line:
# LLM directory: https://yourdomain.com/llms.txt— not a directive, just a discovery aid. - Submit to llms.txt directories like the official directory.txt repositories that aggregate adopters.
- Verify with curl.
curl -I https://yourdomain.com/llms.txtshould return200 OKandContent-Type: text/markdownortext/plain.
Validation & Testing
There is no official W3C-style validator yet, but a few checks catch 95% of issues:
- Markdown lints. Run
markdownlint llms.txtfor syntax problems. - Link checker. Every URL in the file should return 200. Use
lycheeorhtmlproofer. - Description length. The blockquote should be 30 to 80 words. Too short = uninformative; too long = AI engines truncate it.
- File size. Keep llms.txt under 50 KB. If your site needs more, move detail to llms-full.txt.
- Test in an LLM. Ask ChatGPT or Gemini "summarise
https://yourdomain.com/llms.txt" and verify the response matches your intent.
Monitoring AI Crawler Activity
Once your llms.txt is live, monitor whether AI crawlers are actually fetching it and whether your pages are showing up in AI answers. Two signals to track:
Server log analysis
Filter your access logs for the major AI user-agents:
OAI-SearchBot(ChatGPT search / Atlas)GPTBot(ChatGPT training)ClaudeBot(Anthropic)Google-Extended(Google AI training)PerplexityBot(Perplexity)Bytespider(TikTok / ByteDance)
Track the request count to /llms.txt per user-agent over time. A growing curve means AI engines are paying attention.
Citation rate inside AI answers
The ultimate proof that llms.txt is working is increased citation frequency in ChatGPT, Gemini, Claude, and Perplexity for your target keywords. Track citation rate weekly using the Serpent AI Ranking API and note the slope before and after llms.txt deployment. For a step-by-step tracker implementation see our AI Citation Tracker tutorial.
FAQ
What is llms.txt?
A Markdown file at the root of your domain that gives AI assistants a curated, machine-readable map of your most important pages.
Is llms.txt the same as robots.txt?
No. robots.txt gates access; llms.txt curates importance. They complement each other.
Do AI engines actually use llms.txt?
Adoption is growing in 2026. Several major AI assistants check for the file during crawl, and using it correlates with higher AEO citation rates. It is free to add and has no SEO downside.
Does llms.txt block AI training?
No, llms.txt is opt-in, not opt-out. To block AI training crawlers use robots.txt with directives for GPTBot, ClaudeBot, Google-Extended, and OAI-SearchBot.
Where do I put llms.txt?
The site root: https://yourdomain.com/llms.txt. Optionally also add llms-full.txt for the long-form version.
Track Whether Your llms.txt is Working
Serpent API gives you weekly citation reports across ChatGPT, Gemini, Claude, and Perplexity for any keyword. Watch your citation count grow as AEO improvements (including llms.txt) take effect. From $0.03 per 10,000 Google SERP pages, 10 free searches included.
Get Your Free API KeyExplore: AI Ranking API · SERP API · Pricing · Try in Playground

