llms.txt Files and AI Visibility: What the Data Shows

There has been a lot of noise around llms.txt since the format was proposed as a way for site owners to communicate with AI crawlers. The idea is simple enough: a plain-text file sitting at your root domain, telling AI systems what content exists and what they are permitted to use. Tidy, logical, appealing to anyone who spent years working with robots.txt.

The problem is that analysis of server logs across 137,000 domains, published by Ahrefs in June 2026, found that 97% of llms.txt files receive no crawler visits at all. Not low traffic. Essentially none. That is a significant finding, and it deserves a clear-headed response rather than either dismissal or panic.

Why the Adoption Numbers Are Misleading

The instinct when you see a figure like 97% is to conclude the format is dead. That is probably too hasty. A more useful read is that most sites implementing llms.txt are doing so speculatively - adding the file because it was discussed in SEO communities, without any evidence that the AI systems they are targeting actually request it. Supply massively outpacing demand is not unusual in early technical SEO cycles.

What the data does suggest is that AI systems are not yet treating llms.txt as a routing signal in the way that robots.txt functions for traditional crawlers. Whether that changes as AI crawl behaviour matures is genuinely uncertain. Google, for its part, has not formally endorsed the format as a ranking or citation input for AI Overviews. That matters, because Google AI Overviews remains the highest-volume AI search surface for most UK brands.

What AI Crawlers Are Actually Reading

If llms.txt is not being read, the obvious question is: what is? The Ahrefs study used bot analytics to examine which user agents were hitting the 137,000 domains in their dataset. AI crawlers - including those associated with OpenAI, Anthropic, and Google - are crawling the web, but they are doing so through conventional page-level requests rather than looking for declarative instruction files.

This is consistent with what we see when auditing sites for GEO performance. The signals that correlate with AI citations are not file-based declarations of intent - they are content-level signals: clear factual statements, structured headings, cited sources, schema markup applied meaningfully, and prose that directly answers specific questions. ChatGPT, Perplexity, and Gemini are all drawing on crawled and indexed content. They are reading your pages, not your instructions about your pages.

There is an analogy with meta keywords here. Meta keywords were a genuine SEO mechanism before widespread abuse made them worthless, and then irrelevant. llms.txt may have a useful future role - but right now, betting visibility budget on it is premature, and the data backs that up.

The Effort Allocation Problem

For marketing teams working with limited resource, the llms.txt findings present a practical prioritisation question. If you have capacity to do one thing this quarter to improve AI search visibility, spending it on generating and maintaining an llms.txt file is almost certainly not the right call - based on current evidence.

The activities that demonstrably affect whether a brand gets cited in AI-generated results are harder and slower, but they are grounded in what AI systems actually process. These include building topical depth on core subject areas, ensuring that key factual claims about your products and services are stated plainly on crawlable pages, and earning mentions in the kinds of sources - publishers, trade bodies, review platforms - that AI models weight heavily. None of that is glamorous. All of it takes longer than dropping a text file in a root directory.

A Narrow But Real Use Case Remains

It would be wrong to write off llms.txt entirely. There is a genuine use case for larger organisations managing complex content permissions - for example, a publisher that licences content commercially and wants to signal explicitly what AI systems may and may not train on or cite. In that context, the file functions more as a legal and commercial instrument than an SEO tactic.

The 3% of llms.txt files that are being crawled - per the Ahrefs data - are likely concentrated in a small number of high-authority domains where AI crawlers have already established crawl priority. If your domain sits in that tier, maintaining an accurate llms.txt file is probably worthwhile as a hygiene measure. For everyone else, the cost-benefit is unfavourable right now.

What This Means for How You Report on AI Visibility

One of the more subtle implications of the llms.txt data is what it reveals about how AI visibility is being measured - or not. If teams are implementing technical signals without any way to verify whether those signals are being read, that is a measurement gap. Reporting that you have deployed llms.txt tells you nothing about whether it has affected your citation rate in Perplexity or your appearance rate in Google AI Overviews.

Proper AI visibility measurement requires tracking citation frequency across AI platforms directly - through manual sampling, prompt testing, and where available, platform-level data. Google Search Console now surfaces some AI Overviews data. Perplexity's citation patterns can be tested systematically. These are the metrics that actually reflect performance, rather than the presence or absence of a file that 97% of AI systems are not reading.

The Practical Takeaway

llms.txt is not harmful. If your developers can implement it in an hour and it sits maintained without ongoing cost, there is no strong reason to remove it. But it should not be on your AI visibility roadmap as a meaningful action item until there is credible evidence that the major AI systems are consuming it at scale.

The Ahrefs data is a useful corrective to a trend that has been driven more by SEO community enthusiasm than by evidence of impact. AI visibility is a real and measurable goal - but it is built through content quality, source authority, and structural clarity, not through instruction files that the systems in question are largely not reading.