CTAIO Labs Ask AI Subscribe free

llms.txt and the Files That Move AI Citations

30-day experiment: llms.txt, schema markup, and FAQ optimization rolled out on 3 sites. Citation delta measured weekly. Does it move AI citations?

Season 3 · Experiment Queued
Baseline measurement starts when Season 3 launches. The 8-week experiment runs in full view. Weekly citation data gets published as it lands. Subscribe to follow the experiment live.

Key Takeaways

  • llms.txt is a proposal, not a standard. Most LLMs are not reading it yet. — llms.txt (proposed by Jeremy Howard) gives AI systems a curated entry point to your site's content. Adoption is growing fast. That does not guarantee any specific LLM is using it today. This experiment measures whether implementing it moves citation rates in practice, not in theory.
  • Schema markup and FAQ structure may matter more than llms.txt alone. — The experiment isolates three variables: llms.txt, expanded schema (FAQPage, Article, HowTo), and structured FAQ content. The goal: separate which element (if any) is actually driving citation improvement.
  • Baseline measurement is the hardest part of this experiment. — You can't measure a change without a reliable before state. Most coverage of llms.txt implementation skips the baseline entirely and claims a "lift" from noise. This experiment runs 4 weeks of baseline measurement across all three sites before implementing anything.

What Everyone Is Claiming and What We're Testing

The marketing story around llms.txt goes: implement this file and AI systems will cite you more. Like most one-sentence explanations of a complex system, it\'s partially true, mostly misleading, and mostly untested at the site level by the people who have adopted it.

The claim has a plausible mechanism. RAG-based AI systems like Perplexity actively crawl the web for retrieval. A well-structured llms.txt gives crawlers an efficient map to your best content, which could improve how thoroughly your site is represented in the retrieval index. But "plausible mechanism" is not the same as "measured effect." This experiment is here to find out whether the effect is real at the site level.

Why This Experiment Exists

There\'s a gap in the llms.txt conversation. LinkedIn is full of posts celebrating adoption: look, we shipped llms.txt, check it out. Practitioner blogs describe the implementation in detail. What almost nobody has published is a controlled measurement of what happens to citation rates after rolling it out. Adoption has outrun measurement, and the people talking about llms.txt loudest are usually not the ones who can tell you whether it worked.

Part of this is understandable. Running a controlled experiment across multiple sites takes weeks, requires discipline about baseline measurement, and produces results that often don\'t flatter the intervention. Publishing "we added llms.txt and the dashboard went up" is easier than publishing "we added llms.txt, citations moved 3% on Perplexity with 8% variance across runs, and were statistically indistinguishable from baseline on ChatGPT." That second answer is the one that actually helps you decide where to spend engineering time.

So this experiment exists to close that gap in the only way I know how: pick three sites, measure a real baseline, ship the interventions, measure again, and publish whatever the numbers say. Even if the numbers are unflattering to the hypothesis. Especially then.

Three Variables, Not One

Most practitioners who implement "llms.txt optimization" do several things at the same time: they add the file, they clean up their schema markup, and they restructure content into Q&A format. Then they observe improved citation rates and attribute the whole move to llms.txt. That isn\'t an experiment. It\'s a confounded observation.

This experiment isolates the three most commonly bundled interventions:

1
llms.txt implementation

A well-structured /llms.txt file following the Jeremy Howard spec: site summary, key page index, content descriptions. Deployed at /llms.txt and /llms-full.txt.

2
Expanded schema markup

Adding or improving FAQPage, Article, HowTo, and BreadcrumbList JSON-LD across the tested pages. No content changes — only structured data additions.

3
FAQ content restructuring

Converting existing content into explicit Q&A format — distinct questions with self-contained answers that can be extracted as citations. The same information, restructured for extractability.

Each site in the experiment receives a different combination of interventions. That lets me attribute effect to specific variables instead of the bundle.

Experiment Timeline

Weeks 1–4
Baseline measurement

No changes to any of the three sites. Weekly citation measurement across ChatGPT, Perplexity, Gemini, and Claude using a fixed query set of 15 topic-relevant questions per site. This establishes the pre-intervention state and accounts for natural variance.

Week 4
Interventions deployed

Each site receives its assigned intervention (or combination). No other changes to content, publishing cadence, or technical infrastructure during the measurement window.

Weeks 5–8
Post-implementation measurement

Same query set, same measurement cadence. Delta vs baseline calculated for each week and each LLM independently. Results published weekly as part of the Season 3 experiment.

The Token Budget Problem Nobody Mentions

An llms.txt file has a real upper size limit that nobody talks about: the consuming LLM's context window. A 50,000-token llms.txt is useless for a model with a 32,000-token context. Even for long-context models, a 200,000-token llms-full.txt forces the LLM to decide what to read — and if it's deciding, you've lost the curation benefit you were trying to provide.

Practical rule of thumb from building these for ctaio.dev and two other sites: keep the main llms.txt under 10,000 tokens. Put your site summary, a navigation map, and pointers to key pages. Keep llms-full.txt under 80,000 tokens and structure it so any single section is self-contained. If a model reads one chunk, that chunk should be useful on its own. The "paste the whole site into one file" approach fights the retrieval mechanism you\'re trying to help.

What ctaio.dev Does In This Experiment

ctaio.dev is one of the three sites in the experiment. This page, and the Season 3 podcast series, are part of the content infrastructure being measured. If you\'re reading this before Season 3 drops, you\'re in the baseline period. The llms.txt file for ctaio.dev is already live (check /llms.txt). Schema markup is in place on every key page. What the experiment adds is the controlled measurement framework to quantify what effect any of it actually has.

FAQ

What is llms.txt?

llms.txt is a proposed standard (not yet an official web standard) that gives AI systems a structured, human-and-machine-readable overview of your website's content. Analogous to robots.txt (tells crawlers what not to index) and sitemap.xml (tells search engines what to index), llms.txt tells LLM systems what your site covers, how your content is organized, and which files are most relevant for understanding your brand or expertise. It was proposed by Jeremy Howard (fast.ai) in 2024 and has seen rapid adoption across developer and AI-native sites.

Does implementing llms.txt improve LLM citations?

Unknown. That's why this experiment exists. The theoretical mechanism is clear: RAG-based systems like Perplexity actively crawl and index web content for retrieval, and a well-formed llms.txt gives those crawlers an efficient entry point to your most important pages. For LLMs with training cutoffs (ChatGPT, Claude), llms.txt has no direct effect. It can't retroactively affect training data. The experiment measures whether the practical effect on Perplexity and other RAG-first systems is detectable at the site level. Results publish with Season 3.

What should go in an llms.txt file?

The spec (llmstxt.org) recommends: an H1 with the site name, a short blockquote with the site summary, a section listing key pages with brief descriptions, and optional additional sections for specific content types. The file should be in Markdown, served at /llms.txt, and kept concise (ideally under 10,000 tokens). The full version (llms-full.txt) can include more comprehensive content for systems that can handle larger context.

What is the difference between llms.txt and schema markup?

Schema markup (JSON-LD embedded in HTML) tells search engines and structured data parsers about the type and properties of your content. It's primarily read by search engine crawlers and influences traditional search results, rich snippets, and knowledge graph entities. llms.txt is a standalone file read by AI-native crawlers as an entry point to your content map. They operate at different layers: schema describes individual pages, llms.txt describes the site as a whole. Both are tested in this experiment.

Which sites are in the experiment?

Three sites with different domain authority profiles, content volume, and existing citation rates. Site selection and baseline data publish with the Season 3 experiment report. Revealing the sites before results would compromise the measurement by incentivizing outside changes that could confound the data.

How long does the experiment run?

4 weeks of baseline measurement (no changes), then implementation of the three optimizations, then 4 weeks of post-implementation measurement. Total runtime: 8 weeks. Citation rate is measured weekly for each site across ChatGPT, Perplexity, Gemini, and Claude using a standardized query set of 15 topic-relevant questions per site.

Also in Season 3: Agentic Search
S3E1
10 LLM Visibility Tools on 3 Real Brands

Which tools produce actionable data and which produce noise with a good dashboard?

S3E2
GEO vs AEO vs LLM-SEO — Same Content, Three Playbooks

Citation rates measured across ChatGPT, Perplexity, and Gemini after applying each framework independently.

No comments yet. Be the first!

The CTAIO Lab Podcast

Now playing: Building My AI Clone — voice cloning, video avatars, lip sync, and the full production pipeline.

Previously

No previous episodes yet — this is where it all starts.

Up Next

Building My AI Clone · E10

AI video of yourself

From training to serving — the infrastructure stack that gets ML models from notebooks into production reliably.

TensorFlowPyTorchAWS SageMakerGoogle Vertex AIMLflow+1