The Naming Problem
AI search optimization has a branding problem. In three years the field has generated GEO, AEO, LLM-SEO, LLMO, generative search optimization, and conversational SEO. Most of these describe overlapping practices under different names coined by different vendors. Before you can optimize, you need to know what you\'re optimizing for.
Quick glossary so the rest of this piece reads cleanly: GEO = Generative Engine Optimization. AEO = Answer Engine Optimization. LLM-SEO = optimization specifically for LLM training data inclusion and RAG retrieval. All three sit inside the same broad problem (getting cited by AI systems) and each one emphasizes different levers.
Optimizing content to appear in AI-generated answers. Origin: Princeton/Georgia Tech research, 2023. Key tactics: cite sources with actual names and dates, include specific numbers, write conclusions as standalone sentences that can be lifted and quoted verbatim.
Optimizing content to appear as the direct answer in voice search and featured snippets. Origin: pre-LLM, ~2018. Key tactics: Q&A format, concise factual statements, FAQ and HowTo schema markup, conversational query matching.
Optimizing content for LLM training data inclusion and RAG retrieval. Origin: 2024+. Key tactics: topical authority at scale, freshness signals, llms.txt and schema for machine consumption, earn citations on high-crawl-frequency domains.
The Experiment Design
Three variants of the same article. Same topic, same word count (±5%), same publish date, same domain authority. The only variable that changes between variants is the optimization approach.
- Variant A: GEO-optimized — Structured claims, cited statistics, fluent extractable sentences, quotable conclusions.
- Variant B: AEO-optimized — Q&A format throughout, FAQ schema markup, concise definitions for every key term, HowTo schema where applicable.
- Variant C: LLM-SEO optimized — llms.txt implemented, enhanced schema markup, content structured for RAG chunking, authority signals prioritized.
Citation rate is measured at weeks 1, 2, 4, and 6 across four LLMs: ChatGPT 4o, Perplexity, Gemini 2.0, and Claude Sonnet 4.6. Each LLM is queried with 20 representative questions for the article\'s topic. A citation is counted when the LLM\'s response includes the article URL or a clearly attributable statement lifted from the article content.
What To Watch For
Going into this experiment I have a working hypothesis for each LLM. I want them to be falsifiable. Here\'s what I\'m predicting and what would prove me wrong:
- Perplexity. Prediction: GEO wins on Perplexity by a wide margin. Perplexity is RAG-native, so whichever variant gives the retrieval layer the cleanest, most extractable claims should get cited most. Falsifier: if Perplexity cites the AEO variant more than 2x the GEO variant, the retrieval layer is leaning harder on Q&A structure than on extractable-claim density, and my mental model of RAG citation ranking is wrong.
- Gemini 2.0. Prediction: AEO wins on Gemini. AEO was literally designed for Google\'s featured snippet extractors, and Gemini has tight integration with Google\'s index. Falsifier: if Gemini shows the opposite (GEO beating AEO by 2x+), the featured-snippet training signal has been flushed out of the new generation of Gemini models, and the old AEO playbook is now noise on Google\'s flagship LLM.
- ChatGPT 4o. Prediction: the three variants cluster tightly on ChatGPT. Training cutoffs dominate retrieval signals for training-first LLMs. Any structural optimization is a small perturbation on top of "is this domain in the training corpus at all." Falsifier: if one variant beats the others by more than 30%, the in-context retrieval path (ChatGPT\'s browsing-enabled modes) is doing more citation work than I think.
- Claude Sonnet 4.6. Prediction: results are noisy and inconclusive. Claude\'s citation behavior is the least studied of the four and its default configuration does the least live retrieval. Falsifier: Claude shows a sharp, reproducible preference for one variant. That would be a genuinely new data point about how Anthropic\'s RAG story is developing.
The finding I\'ll care about most: whether one framework sweeps all four LLMs. That would suggest the underlying quality signals are universal rather than platform-specific — and it would mean optimization advice can stop being vendor-specific, which is not what the "GEO/AEO/LLM-SEO are all different products" camp wants to hear. Results publish with Season 3.
FAQ
What is GEO (Generative Engine Optimization)?
GEO is an optimization framework developed by researchers at Princeton, Georgia Tech, and IIT Delhi (paper published 2023) for improving content visibility in AI-generated answers. GEO tactics include adding authoritative statistics and citations to content, using fluent and coherent language, including quotations from recognized sources, and making content easy to extract as self-contained factual statements. The original research measured a 40% improvement in citation rate using the combined GEO approach on a benchmark query set.
What is AEO (Answer Engine Optimization)?
AEO predates LLMs. It was originally designed for voice search and featured snippets. The goal: optimize content to appear as the direct answer to a question in Google, Alexa, or Siri. AEO tactics include structured Q&A format, concise factual statements that stand on their own, HowTo and FAQ schema markup, and targeting conversational query phrasing. These tactics have carryover to LLM citation because LLMs trained on web content absorb the same structural signals that made content "quotable" for featured snippets.
What is LLM-SEO?
LLM-SEO is the most recent framing and the least standardized. It refers to optimizing content specifically to be included in LLM training data and to be retrieved by RAG-based search systems like Perplexity. LLM-SEO tactics include building topical authority at scale, improving content freshness signals, implementing llms.txt and structured data for machine consumption, earning citations on domains that appear in common web crawls, and writing content in the format LLMs prefer for extraction. The field is moving fast with no consensus methodology yet.
Why test these frameworks against each other?
Because practitioners are being sold all three under different names and there's very little rigorous head-to-head data. The experiment creates a controlled comparison: same topic, same content length, same publishing conditions, three variants optimized with different playbooks. Citation rate measured weekly over 6 weeks across ChatGPT 4o, Perplexity, Gemini 2.0, and Claude Sonnet 4.6. The goal: find out whether the frameworks produce different outcomes, and if so, which context each one fits.
When will the results be published?
The experiment runs as part of Season 3 of the CTAIO Labs podcast. Results publish alongside the S03E02 podcast episode. Subscribe to the newsletter if you want to be told when the data drops.