10 LLM Visibility Tools on 3 Real Brands
Profound, Peec AI, AthenaHQ, Otterly, Scrunch, Evertune, Rankscale, Bluefish, Semji, and Goodie AI scored on coverage, accuracy, pricing, and the freshness problem nobody puts on the dashboard.
CTAIO Labs · Season 3
How to show up when an AI agent is the one doing the searching. Ten LLM visibility tools tested on three real brands, three optimisation playbooks run on identical content, and a 30-day llms.txt citation experiment across three sites.
Profound, Peec AI, AthenaHQ, Otterly, Scrunch, Evertune, Rankscale, Bluefish, Semji, and Goodie AI scored on coverage, accuracy, pricing, and the freshness problem nobody puts on the dashboard.
One identical article rewritten under each framework. Citation rates measured across ChatGPT, Perplexity, and Gemini. Which playbook actually moves the needle — and how much overlap is there?
llms.txt, schema markup, and FAQ structure rolled out on three sites. Per-engine citation delta measured weekly. Does the spec actually move citations, or is everyone just shipping noise?
Twelve Schema.org variations on one identical article. Citation rate measured weekly across ChatGPT, Perplexity, Gemini, and Claude. The controlled A/B isolating which schema choices actually move the metric.
CTAIO Labs is the field side of the network. The category-level explainers and the scored tool radar live on WeTheFlywheel — read those alongside these experiments for the full picture.
Marie Haynes' framing, the two meanings of "agentic search," the five engines worth tracking, and the publisher checklist.
Read on wetheflywheel.com → WTF RadarTwelve generative engine optimisation platforms scored across 18 metrics. Six earned the Radar recommendation, with prices, coverage, and vendor independence notes.
Read on wetheflywheel.com → WTF HubEditorial cluster on the four AI Search optimisation disciplines. Pillar explainers, the live tool radar, and the experiments on this page.
Read on wetheflywheel.com → CTO POVThe executive-side argument: six of the eight workstreams that move citation rate are engineering surfaces, not marketing surfaces. The ownership matrix and the operational test.
Read on wetheflywheel.com →Agentic search is the use of AI agents — ChatGPT Agent, Perplexity Pro, Gemini Deep Research, Claude with web browsing — to complete information or task-oriented goals on the web. A user describes an outcome, and the agent autonomously searches, visits pages, reads, compares, and returns a synthesised answer with citations. The human never scans a SERP. For publishers, the implication is direct: optimise for the agent as reader, because the agent is the one deciding what the user hears.
Three experiments, run in sequence. S3E1 tests ten LLM visibility tools (Profound, Peec AI, AthenaHQ, Otterly, Scrunch, Evertune, Rankscale, Bluefish, Semji, Goodie AI) against three real brand portfolios. S3E2 takes one identical article and rewrites it under each of the three optimisation playbooks (GEO, AEO, LLM-SEO), then measures citation deltas across ChatGPT, Perplexity, and Gemini. S3E3 ships llms.txt, schema markup, and FAQ structure on three sites and measures the citation impact over 30 days.
The vendor pitch decks in this category claim a lot. Different platforms measure different things — mention rate, citation rate, sentiment — and call all of them "visibility." Different optimisation frameworks claim citation lift without comparing against a baseline or a sibling framework. The Labs experiments hold the brand, the article, the queries, and the engine list constant, so the deltas are interpretable.
The category-level explainers — what GEO is, what AEO is, how agentic search differs from generative search — live on wetheflywheel.com/en/ai-search. The scored tool radar lives on wetheflywheel.com/en/radar/geo-tools. CTAIO Labs is the field side: real brands, real budgets, methodology and numbers published. Three surfaces, one network.
Each experiment publishes as its data set completes. The full Season 3 scorecard — combining the visibility-tools rankings, the framework-test citation deltas, and the llms.txt experiment results into one decision document — drops when the podcast wraps the season. Subscribers to the CTAIO newsletter get the headline numbers before the public write-up.
Now playing: Building My AI Clone — voice cloning, video avatars, lip sync, and the full production pipeline.
No previous episodes yet — this is where it all starts.
AI-augmented pipelines are changing how teams build, test, and deploy. The state of CI/CD in the age of agents.
The $100M migration nobody wants to talk about. Why ERP modernization fails and the patterns that actually work.