Everyone has theories about how LLMs access web content. We decided to test one.
The hypothesis: if ChatGPT and Gemini are crawling websites to answer questions, embedding context directly in HTML might influence their responses. The logic seemed sound-these systems cite web sources, search engines read HTML comments, so strategic comments might guide LLM answers.
We were wrong. And the way we were wrong taught us something more useful than being right would have.
The Experiment
We used a healthcare client’s “Meet the Founders” page as a test case. When we queried both LLMs about where the founders originally met, we got vague, inconsistent answers. No solid knowledge of this specific detail. Clean baseline established.
First test: we embedded an HTML comment in the page’s <head>, explicitly addressed to ChatGPT and Gemini, stating where the founders met. Waited for indexing. Ran queries across multiple sessions. Tested question variations.
Complete failure. The LLMs continued giving the same vague answers.
Second test: we moved the content to a hidden div in the <body>-visible to traditional crawlers but hidden from users via CSS. This time we pushed harder. We told the LLMs the information was in the HTML. Specified which section. Gave them the opening syntax. Asked them to extract that specific information.
We were actively trying to help them find our planted data.
Still complete failure. They either claimed they couldn’t find it, made up different answers, or gave generic non-committal responses.
What We Actually Learned
This wasn’t a failed trick. This was successful scientific testing that produced concrete data about LLM behavior.
The pattern we’re seeing: LLM crawling capabilities appear to be roughly where Google’s were 5-6 years ago.
Modern Google renders JavaScript fully, processes complex dynamic content, reads HTML comments, understands hidden divs. Google circa 2018-2019 preferred static HTML, struggled with dynamic content, needed clear and visible markup.
LLMs in 2024 appear to access pre-processed, simplified content-likely visible text only. They can’t or won’t access HTML comments. They can’t or won’t process hidden div content. Even when we literally told them where to look, gave them the exact syntax pattern, and asked them to extract specific information, they couldn’t access it.
This isn’t about LLMs being too smart to fall for hidden content. They simply don’t have access to that layer of your HTML.
The Implications for Content Strategy
If LLM crawling is 5-6 years behind Google, the lessons we learned then apply now.
Content potentially invisible to LLMs: FAQ schema hidden by default, accordion content requiring interaction, tab content not in the initial DOM, mobile-hidden elements, dynamically loaded content, anything behind “Read more” buttons, JavaScript-rendered sections.
The 2018 SEO principles that apply to 2024 LLMs: put critical content in visible HTML, don’t hide important information in accordions or tabs, make your primary message clear in the initial page load, use straightforward HTML structures, don’t rely on complex rendering for key content.
The simpler and more accessible your content structure, the better your chances with LLMs.
The Gap Between Speculation and Testing
The industry is rushing to “optimize for AI answers.” Our test reveals a critical gap in that conversation.
If LLMs can’t read content when you embed it strategically in your HTML, tell them explicitly where it is, and give them the exact pattern to look for-what does that mean for subtle optimization tactics being sold as “AI SEO”?
Most of it is speculation built on speculation.
We tested the basics. LLMs can’t even access content Google mastered reading years ago. Before you invest in complex AI optimization tactics, test whether LLMs can actually see the content you already have.
Why Failed Experiments Matter
This test didn’t prove our hypothesis. It disproved it comprehensively.
That’s more valuable.
Now we have data: HTML comments are invisible to LLMs. Hidden div text is invisible to LLMs. Even explicit directions don’t help LLMs access this content. LLM content access is more limited than industry speculation suggests.
We didn’t waste time building strategy around a false assumption. We tested it, documented it, and moved on with data.
The most dangerous position in SEO is certainty without testing. We were fairly confident LLMs might read HTML comments. The hypothesis seemed logical. The test proved us completely wrong.
But it taught us something actionable: LLM crawling is following the same evolution Google went through. That means we already know the playbook.
We run controlled experiments because industry speculation isn’t verified behavior. If you’re questioning assumptions in your AI strategy, let’s talk-I share what we’re learning from real testing.
