AI Search: Why evidence-dense FAQs win, and thin Q&A doesn‘t
Most FAQ advice in the AI visibility category sells one move: add FAQPage schema, get cited more. A study tracking 1,885 pages that added JSON-LD between August 2025 and March 2026 tested that claim directly. The lift on Google AI Mode was +2.4% and the lift on ChatGPT was +2.2%, both statistically indistinguishable from zero Ahrefs · 2026. The lever that drives AI citation is what is inside the answer, not the markup around it.
Schema is hygiene, not a citation lever
The vendor narrative is easy to find. One representative example reports a 19.72% rise in AI Overview visibility on its own site after deploying entity linking. A customer case in the same source reports a 69% rise in clicks on non-branded queries van Berkel · Schema App · 2025. The framing has travelled into agency decks, conference talks, and brand-team briefings. It produces a predictable instruction to in-house content teams: add the markup, expect the lift.
The 1,885-page study tested that instruction directly. The methodology was a matched difference-in-differences design against 4,000 control pages. Adding JSON-LD was the only intervention. Citations barely moved, and Google AI Overviews showed a small 4.6% decline Ahrefs · 2026.
A second study points the same way. A 300,000-domain analysis of llms.txt adoption found no relationship between the file's presence and a domain's AI citation rate SE Ranking · 2025. Removing the variable from the model improved its accuracy. Two large independent datasets, two different interventions, the same finding: the file format and the page markup are not where citation is decided.
The page-decoration era of FAQ markup is over
Google itself sent the clearest signal in May 2026. FAQ rich results no longer appear in Google Search. Support in the Rich Results Test ends in June 2026, and Search Console API support is removed in August 2026 Google · 2023.
The schema remains valid Schema.org and is still read by AI engines that ingest structured data. What is gone is the SERP decoration: the user-visible reward for marking up FAQ content with JSON-LD. The page-decoration era of FAQ markup is over. What remains is the content inside the answer.
Citation is earned by content density, not markup
The strongest evidence for what actually drives citation comes from the Princeton GEO study. Adding citations, quotations, and statistics to existing content raised its visibility in AI-generated answers by up to 41% on average Aggarwal et al. · Princeton University / Georgia Tech / Allen Institute for AI / IIT Delhi · 2024. The largest gains were on pages ranked outside the top of traditional search. The intervention was not markup. It was the addition of evidence inside the content.
The GEO-16 framework supports the same conclusion from a different angle. Three on-page layers were most strongly associated with being cited by AI answer engines: metadata and freshness, semantic HTML markup, and structured data arXiv · 2025. Pages scoring at least 0.70 on the GEO-16 quality score were cited at substantially higher rates than pages below it. The same was true of pages meeting at least 12 of the 16 quality pillars.
A third study isolates structure from semantics. The GEO-SFE framework separates content structure into three layers: document architecture, information chunking, and visual emphasis. Applied to the same underlying text across six mainstream AI search engines, it lifted citation rates by 17.3% and subjective answer quality by 18.5% arXiv · 2026. The semantic content was held constant. Only structure changed.
A rewriting study confirms the direction. Travel pages were edited to add credible citations and statistical evidence with cleaner phrasing. The rewrites produced a 15.63% rise in absolute word count surfaced inside generative responses. A position-weighted version of the same metric rose by 30.96% arXiv · 2025. The lever in every case is the evidence inside the answer.
The right answer length is the length the retrieval system is built to handle
AI assistants do not retrieve whole pages. They retrieve chunks. A multi-dataset analysis of chunk size in retrieval-augmented generation tested the trade-offs directly Bhat et al. · arXiv (cs.IR) · 2025. Smaller chunks of 64 to 128 tokens are optimal for short fact-based answers; larger chunks of 512 to 1,024 tokens work better for broader context. A high-absorption FAQ answer sits in a single chunk of roughly 150 to 450 tokens, or 80 to 250 words.
The chunking strategy itself matters as much as the model that embeds the result. A systematic study found that simple recursive token chunking around 100 tokens with no overlap consistently outperformed more elaborate strategies arXiv · 2025. Retrieval-tuned embedding models like Nomic and Intfloat E5 beat domain-specialised ones like SciBERT on the same benchmarks. The two high-yield choices are chunk size and embedding model. Most of the elaborate machinery around them does not pay for itself.
Document structure carries signal beyond the chunk. A retrieval system that navigates a document's structure tree scores both passage relevance and position in the hierarchy. The approach set a new state of the art on multi-document question answering arXiv · 2025. Headings, section order, and parent-child relationships are themselves a ranking signal. A hierarchical chunker that respects these layers improves answer quality without paying a heavy time cost arXiv · 2025.
Position on the page also matters. A mapping of 100 Google AI Overview citations found that 55% of cited snippets sit in the top 30% of the source page CXL · 2026. The middle third of the page produces 24% of citations, and the bottom 40% produces 21%. Answers buried below the fold are materially less likely to be picked up. The combined finding: short chunk, clean hierarchy, near the top of the page.
A reproducible structure for a high-absorption FAQ answer
The plan that emerges from the retrieval literature is concrete enough to template. A high-absorption FAQ answer fits in a single retrieval chunk and contains five elements in a fixed order.
The first sentence pairs the entity name with the direct answer. Yes, no, or one concise statement of fact. No filler words as intros. The retrieval system that lifts this sentence as a snippet will surface the entity name and the answer in the same breath.
The second and third sentences carry the core supporting fact: a number, a defined term, or an explicit comparison. The evidence does the work that the topic sentence asserts. Where a counterweight or limit exists, it sits inside the same chunk, not in a later section.
The fourth and fifth sentences add one supporting detail and one final framing or differentiator. The detail anchors the claim against an alternative or against a baseline. The framing closes the chunk with a clean landing.
Below the answer block sits a source citation, an author name with credential, a verification date, and machine-readable provenance. The citation lives on the same line as the claim it supports, not in a collapsible.
An E-commerce-specific testbed confirms the pattern. Across 15 product-page rewriting tactics, no single hand-crafted heuristic reliably wins. Iterative prompt-optimised rewrites converge on the same structural recipe across product categories arXiv · 2025. The structure is stable. The content is what varies.
Pinterest's published GEO framework reaches the same place from a different starting point. Individual images lack the words and authority signals generative search rewards. The system predicts what users would search for from each image, groups images into theme pages, and links them with authority signals. The live deployment produced 20% organic traffic growth Pinterest · 2026. The lesson generalises: the high-absorption unit is a self-contained chunk with named entity, direct answer, evidence, and authority signals on the same surface.
A role-augmented approach extends the pattern further. A page can be rewritten through several informational personas and then refined against each one. The role-rewritten version produced larger gains in subjective impression and measured presence inside generative answers than single-axis approaches arXiv · 2025. The implication is that a high-absorption FAQ answers a question several readers might be asking from different angles.
The AutoGEO framework codifies what generative search engines reward when they pick and rewrite content for AI answers. The extracted preferences turn into rewriting rules that raise content traction in AI answers while preserving search utility arXiv · 2025. The rules are not folklore. They are testable, and the tests favour the structure described above.
Evidence density, in operational terms
The Princeton GEO finding reduces to four operational moves. Each is a copy-level change a writer can apply to any FAQ answer in any product category.
The first move is definition. Define key terms inline the first time they appear. A definition inside the chunk gives the retrieval system a clean handle for the entity. It also gives the human reader permission to keep reading.
The second move is statistic. Replace qualitative claims with quantified ones. "Most brand teams" becomes "67% of brand teams in the 2026 cohort." A statistic with its source on the same line is the highest-yield single edit in FAQ copy.
The third move is comparison. Anchor the product or the position against an alternative or a baseline. "Better than competitors" is filler. "Three times the response rate of the unstructured baseline" is evidence. The chunk that contains a comparison retrieves with the comparison intact.
The fourth move is provenance. Name the source on the same line as the claim. A study, a year, a one-sentence methodology. Google's E-E-A-T quality rater guidelines update added experience as a fourth pillar Google · 2022. Trust sits above the other three pillars and is reinforced by visible attribution.
The freshness dimension extends the same logic. Pages not updated for a quarter are over three times more likely to lose AI citations. 83% of commercial citations come from pages refreshed within a year AirOps · 2026. The four moves are the content. Visible date stamps and quarterly review are the maintenance.
What to avoid
Several patterns recur in FAQ content that fails the AI test. Each is predictable enough to audit against.
Marketing fluff sentences are the first. A sentence that says "our customers love our award-winning service" carries no defined term, no number, and no named entity. The chunker has nothing to lift. The sentence retrieves as noise.
Vague qualitative claims without numbers are the second. "Many," "leading," "trusted by enterprise teams." Each can be replaced with a number. Each should be.
The same fact repeated across many FAQs is the third. A page that says "our pricing is competitive" in eight different FAQ answers produces eight chunks with the same low-value claim. None of them is the chunk the retrieval system needs.
Provenance buried in collapsibles is the fourth. A source link that appears only when the user clicks "show more" does not exist for the chunker. The position-on-page finding compounds the problem. A citation in a collapsible at the bottom of the page sits in the 21% zone, not the 55% zone CXL · 2026.
FAQs hosted only in PDFs or image badges are the fifth. A claim inside a JPEG is not text the retrieval system reads. A PDF without a hosted HTML twin retrieves poorly even when the PDF is technically indexable.
The pattern across all five is the same. The chunker rewards self-contained, evidence-dense, source-attributed prose in HTML. Everything else is friction.
An audit of 30 live llms.txt files in the wild found five recurring anti-patterns of the same family Kenimo · DEV Community · 2025. The recurring failures were overlong files, URLs contradicting robots.txt, no Markdown twin of pages, marketing prose instead of pointers, and files frozen with dead links. The failure modes generalise.
Sources
Sources are tiered per our methodology & sources page.
Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior
arXiv · 2026
A structural-engineering framework called GEO-SFE separates content structure into three layers: document architecture, information chunking and visual emphasis. Applied to the same underlying text, the framework lifts citation rates in generative engines by 17.3% on average and subjective answer quality by 18.5% across six mainstream AI search engines. The semantic content itself is preserved; only structure changes.
Methodology note
arXiv paper 2603.29979 by Yu, Yang, Ding and Sato, submitted March 2026. The authors define structural features at macro, meso and micro levels and build predictive models for citation probability that are tuned per engine. They evaluate the framework against six generative engines and report consistent gains in citation rate and quality across configurations.
Pinterest: Generative Engine Optimization — A VLM and Agent Framework for Acquisition Growth
Pinterest · 2026
Individual images lack the words and authority signals that generative search rewards, so visual platforms risk being skipped over while users get their answer in the chat. Pinterest's response is to predict what users would search for from each image, group images into theme pages, and link them with authority signals. The live system added 20% organic traffic growth.
Methodology note
First-party engineering paper from Pinterest. Vision-Language Models were fine-tuned to predict likely search queries for each image, aided by agents that mine real-time internet trends. Predicted queries drive collection pages built from multimodal embeddings, with hybrid two-tower nearest-neighbour architectures handling authority-aware interlinking. The system runs in production across billions of images and tens of millions of collections.
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce
arXiv · 2025
Across 15 common product-page rewriting tactics tested on a shopping benchmark, no single hand-crafted heuristic reliably wins. A simple iterative prompt-optimisation routine outperforms all of them. The optimised prompts converge on the same pattern across categories, pointing to a stable, domain-agnostic recipe for making product listings more visible to conversational shopping agents.
Methodology note
First public e-commerce GEO benchmark (E-GEO) with over 7,000 multi-sentence consumer product queries paired with relevant listings, capturing intent, constraints, and shopping context. The authors evaluated 15 rewriting heuristics on this benchmark, then formulated GEO as an optimisation problem and ran a lightweight iterative prompt-optimisation algorithm. Data and code are public.
What Generative Search Engines Like and How to Optimize Web Content Cooperatively (AutoGEO)
arXiv · 2025
AutoGEO is a framework that extracts the preferences generative search engines apply when picking and rewriting content for AI answers. The researchers turn those preferences into rewriting rules, then test them on the GEO-Bench benchmark plus two new benchmarks built from real user queries. Both the prompt-based AutoGEO API and the trained AutoGEO Mini model raise content traction in AI answers while preserving search utility.
Methodology note
Academic preprint posted on arXiv on October 13, 2025, by researchers from Carnegie Mellon (Yujiang Wu, Shanshan Zhong, Yubin Kim, Chenyan Xiong). The team probes frontier large language models to surface preference rules, then uses them as context engineering for one system and as rule-based rewards for training a smaller cost-efficient model. Code is released on GitHub.
Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness (RDR²)
arXiv · 2025
Treating retrieved passages as isolated chunks throws away signal that the original document layout carries. A router that navigates a document's structure tree, scoring both passage relevance and its position in the hierarchy, sets a new state of the art on multi-document question answering. Headings, section order, and parent-child relationships are themselves a ranking signal.
Methodology note
Academic paper (RDR2, EMNLP 2025 Findings) introducing a trainable document-routing step inside the retrieve-and-read pipeline. An LLM-based router walks document structure trees with automatic action curation and structure-aware passage selection. The framework was evaluated across five question-answering datasets that demand multi-document synthesis.
HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking
arXiv · 2025
Most retrieval benchmarks cannot tell a good chunking strategy from a bad one because the answers can be found in any reasonable split of the text. A new benchmark built on evidence-dense questions shows that chunking choices visibly change end-to-end answer quality, and that a hierarchical, multi-level chunker improves performance without paying a heavy time cost.
Methodology note
Academic paper introducing HiCBench (manually annotated multi-level chunk points plus synthesised evidence-dense question-answer pairs with traceable evidence) and the HiChunk framework: fine-tuned large language models that produce multi-level document structure, combined with an Auto-Merge retrieval algorithm. Chunking quality was tested across the full retrieval-augmented generation pipeline.
AI Answer Engine Citation Behavior: An Empirical Analysis of the GEO-16 Framework
arXiv · 2025
Three on-page properties showed the strongest association with whether a page got cited by AI answer engines: metadata and freshness, semantic HTML markup, and structured data. Pages that scored at least 0.70 on the GEO-16 quality score and met at least 12 of 16 quality pillars were cited at substantially higher rates than pages that did not.
Methodology note
70 product-intent prompts were run across Brave Summary, Google AI Overviews, and Perplexity, producing 1,702 citations across 1,100 unique URLs. The researchers audited each cited page against a 16-pillar framework and used logistic models with domain-clustered standard errors. The study focuses on English-language B2B SaaS pages. Published September 2025.
Role-Augmented Intent-Driven Generative Search Engine Optimization
arXiv · 2025
Generative search engines reward content that anticipates the different roles a user might be playing when they ask a question. Rewriting a page through several informational personas, then refining it, produced larger gains in both subjective impression and measured presence inside generative answers than approaches that optimise on a single axis.
Methodology note
Academic paper introducing Role-Augmented Intent-Driven G-SEO, which models search intent through reflective refinement across multiple informational roles. The authors extended an existing GEO dataset with diversified query variations and introduced G-Eval 2.0, a six-level large-language-model-augmented rubric for finer-grained, human-aligned scoring of optimisation outputs.
Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation
arXiv · 2025
Rewriting web copy to add credible citations, statistical evidence, and cleaner phrasing measurably increases how much of that copy gets reproduced in AI answers. Optimised travel pages saw a 15.63% rise in absolute word count surfaced inside generative responses and a 30.96% rise on a position-weighted version of the same metric, with small computational cost.
Methodology note
The team fine-tuned a BART-base transformer on 1,905 paired travel-website passages, each pairing raw copy with a generative-engine-optimised rewrite. Quality was scored with ROUGE-L and BLEU against the optimised targets; visibility was tested by feeding both versions to Llama-3.3-70B and counting how much of each rewrite appeared in the model's responses.
Chunk Twice, Embed Once: Systematic Study of Segmentation and Representation Trade-offs
arXiv · 2025
How a page is split into chunks matters as much for retrieval as which model embeds it. Simple recursive token chunking around 100 tokens with no overlap (R100-0) consistently beat more elaborate strategies. Retrieval-tuned embedding models such as Nomic and Intfloat E5 outperformed domain-specialised ones like SciBERT, suggesting embedding choice and chunk size are the high-leverage levers.
Methodology note
Systematic evaluation in a chemistry retrieval setting: 25 chunking configurations across five method families combined with 48 embedding models, tested on three chemistry retrieval benchmarks including the authors' new QuestChemRetrieval dataset. Datasets, code, and benchmark results were released publicly.
Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis
arXiv (cs.IR) · Sinchana Ramakanth Bhat et al. · 2025
Chunk size in retrieval-augmented generation has a large effect on retrieval quality. Smaller chunks of 64 to 128 tokens are optimal when answers are short and fact-based. Larger chunks of 512 to 1024 tokens work better when broader context is needed. Embedding models react differently: Stella benefits from larger chunks for long-range retrieval, while Snowflake performs better with smaller chunks for entity-level matching.
Methodology note
Peer-style arXiv paper (2505.21700) by Bhat, Rudat, Spiekermann and Flores-Herr, submitted May 2025. The authors systematically test fixed-size chunking from 64 to 1024 tokens across multiple embedding models and both short-form and long-form datasets, measuring retrieval performance across configurations. Results highlight the interaction between chunk size, embedding model and dataset characteristics.
GEO: Generative Engine Optimization
Princeton University / Georgia Tech / Allen Institute for AI / IIT Delhi · Pranjal Aggarwal et al. · 2024
Adding citations, quotations, and statistics to content can increase its visibility in AI-generated answers by up to 41% on average. Pages ranked outside the top of traditional search saw the largest gains. The effect varies by content domain and by AI engine, but the lift from evidence-style content elements is consistent across the conditions tested.
Methodology note
10,000 questions were run through generative search engines. The researchers compared answers before and after applying nine content optimisation strategies, including citations, quotations, statistics, and authoritative language. They measured visibility as the share of the AI answer attributable to the optimised page, using both word position and word count metrics. Peer-reviewed at KDD 2024.
Google FAQ Structured Data Guidelines (FAQPage Schema)
Google · 2023
Google's FAQPage structured data documentation announces that as of May 7, 2026, FAQ rich results no longer appear in Google Search. Support in the rich-result report and Rich Results Test ends in June 2026, and Search Console API support is removed in August 2026. While the feature is being deprecated, FAQ markup itself remains valid Schema.org and is still used by AI engines that read structured data.
Methodology note
Official Google Search Central documentation page for FAQPage structured data, last updated 8 May 2026. The page sets out the schema requirements (FAQPage, Question, Answer), eligibility rules (limited to authoritative health or government sites), content guidelines and the deprecation timetable for FAQ rich results.
Google announced an update to its Search Quality Rater Guidelines, adding a second E to E-A-T to create E-E-A-T: Experience, Expertise, Authoritativeness, and Trustworthiness. Experience asks whether content reflects first-hand or life experience with the subject. Trust is positioned as the most important of the four, and the others support it. The guidelines instruct human raters who evaluate search quality.
Methodology note
Official Google Search Central blog announcement from December 15, 2022, accompanying a revised version of the public Search Quality Rater Guidelines PDF. The guidelines describe how Google's external quality raters score sample results to train and evaluate ranking systems. Ratings do not directly change rankings but feed into system improvements.
We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved
Ahrefs · 2026
Across 1,885 pages that added JSON-LD between August 2025 and March 2026, schema produced no meaningful uplift in AI citations. Matched difference-in-differences tests against 4,000 control pages showed +2.4% on Google AI Mode and +2.2% on ChatGPT (both statistically indistinguishable from zero) and a small 4.6% decline on Google AI Overviews. 53% of AI-cited pages already carry schema, but this reflects overall site quality.
Methodology note
Ahrefs identified 1,885 URLs that transitioned from no JSON-LD to having JSON-LD between August 2025 and March 2026, using its crawler database. Each treated page was matched to three control pages from different domains with similar pre-period citation levels. Citation changes were measured 30 days before and after the schema-add date across AI Overviews, AI Mode and ChatGPT using four statistical tests including matched difference-in-differences.
Where Google AI Overviews Cite From: A 100-Page Analysis
CXL · 2026
In a mapping of 100 Google AI Overview citations, 55% of cited snippets sit in the top 30% of the source page. The middle third of the page produces 24% of citations. Everything past the 60% mark accounts for just 21%. Pages whose answer is buried below the fold are far less likely to be picked up.
Methodology note
CXL coded 100 individual AI Overview citations by where in the source page the cited passage appeared, splitting each page into vertical thirds. The data was used to assess how page position relates to citation probability. The original page was inaccessible at the time of writing; figures were confirmed via secondary coverage referencing the CXL study directly.
LLMs.txt Shows No Clear Effect on AI Citations (300K domains)
SE Ranking · 2025
Across 300,000 domains, only 10.13% had an llms.txt file. Adoption is roughly flat across traffic tiers, with high-traffic sites slightly less likely (8.27%) to use it than mid-tier ones (10.54%). Statistical tests and an XGBoost model found no relationship between the presence of llms.txt and how often a domain is cited by AI engines. Removing the variable from the model actually improved its accuracy.
Methodology note
SE Ranking study of nearly 300,000 domains, published November 2025. The team checked each domain for an llms.txt file, segmented adoption by monthly traffic, and modelled citation frequency using Spearman correlation, XGBoost regression and SHAP analysis. The conclusion is based on whether llms.txt presence improved or degraded model predictions of LLM citations.
Pages not updated for a quarter are over three times more likely to lose AI citations. About 70% of cited pages were updated in the last 12 months, and 83% of commercial citations come from pages refreshed within a year. Sequential heading hierarchies correlate with 2.8 times higher citation likelihood; 87% of cited pages use a single H1, and 48% of citations come from user-generated platforms.
Methodology note
Industry report from AirOps with Kevin Indig, drawing on millions of citation datapoints across ChatGPT, Google AI Overviews, AI Mode, Gemini, and Perplexity. Findings are organised around freshness, on-page structure, schema use, user-generated content, off-site mentions, and visibility stability, with specific percentage gaps tied to each signal.
What 2025 Revealed About AI Search and the Future of Schema Markup
Schema App · Martha van Berkel · 2025
In 2025, Google and Microsoft publicly confirmed they use Schema markup for generative AI features, and ChatGPT confirmed it uses structured data to decide which products appear in results. Schema App reported a 19.72% rise in AI Overview visibility on its own site after deploying Entity Linking, and customer InSinkErator a 69% rise in clicks on non-branded queries.
Methodology note
First-party essay by Schema App's CEO. The piece argues structured data should be treated as a knowledge graph rather than a rich-result trick, and uses examples from Schema App's own site and named customers (InSinkErator, Wells Fargo) plus public statements from Google, Microsoft, and ChatGPT to support the case.
I Audited 30 llms.txt Files in the Wild — 5 Anti-Patterns Already Forming
DEV Community · Kenimo · 2025
An audit of 30 live llms.txt files found five recurring failures: overlong files with too many links; URLs contradicting robots.txt for the very AI crawlers expected to read them (about a third of files); no Markdown twin of pages (24 of 30); marketing prose instead of pointers; and files frozen since 2024 with dead links and renamed slugs.
Methodology note
Practitioner blog post on dev.to. The author manually audited 30 llms.txt files in the wild against the original Jeremy Howard proposal and against guidance from Mintlify and the llmoframework, then documented five anti-patterns with examples. Three of the audited files were the author's own, used as a control on bias.
About the author Max Ackermann
Max Ackermann is founder and Managing Director of info.link, the product data platform that makes brands visible in AI search and connects every physical product to the web through GS1 Digital Link. He writes about AI search and generative engine optimization (GEO), AI-powered commerce, and how brands can structure product data for ChatGPT, Gemini, Perplexity, and retailer AI assistants like Amazon Rufus.
Max has 20+ years of experience building digital products and businesses. He previously led McKinsey's Corporate Venture and Design teams across Europe, and as Managing Director of a leading US digital agency he built platforms with Nike, Google, Meta, and Airbnb. He founded the UX Design program at Central Saint Martins College, University of the Arts London, and is a Fellow of the UK's Higher Education Academy. info.link is headquartered in Hamburg and Berlin and counts GS1 Germany among its investors.
Follow Max on LinkedIn.


