AI Crawlability Checker

In this article

Q1. Why Is AI Crawlability the New Front Line of the Shift from SEO to GEO? [toc=1. Why AI Crawlability Matters]

For fifteen years, "getting found" meant one thing: rank in Google's ten blue links. That single scoreboard is cracking. A Head of Growth we spoke with last quarter had a page-one Google ranking, solid traffic, and a quiet panic. Buyers kept saying they'd "asked ChatGPT" and never heard the brand's name. The artifact under the microscope was not a keyword report. It was a single question: can the machine even read the page?

Can the machine even read the page?

🎯 The Shift Nobody Warned You About

Search is splitting into two games. One is ranking a link. The other is becoming the answer an AI engine quotes back to a buyer. Gartner projects that over 50% of search traffic will move from traditional engines to AI-native platforms by 2028. Ethan Smith of Graphite frames the outcome bluntly: the "pie of search is getting larger," and AI chat is a new, mostly organic slice.

⚠️ Why This Outcome Is Binary

Here is the part that changes how you should think. In AI answers, there is no page two. When a buyer asks Claude or ChatGPT for the best tool in your category, only five to ten brands get named. If you are not in that set, you are not a low ranking. You are absent from the buying conversation entirely.

That is the fear we hear most in audits. A founder asks, in effect, whether the machine can even see the content they spent real money building. It is a fair question, because the answer is often no.

💸 The Eligibility Gate Comes Before Everything

Most AI crawlers fetch your raw HTML. They do not run JavaScript. Vercel and MERJ, studying GPTBot, ClaudeBot, and PerplexityBot at the scale of hundreds of millions of monthly requests, found these bots pull HTML but do not execute scripts. So if your content only appears after JavaScript runs, the crawler sees a near-empty page.

Crawlability is the gate you pass before the contest starts. You can have the best answer in your category and still lose, simply because the engine never read it. This is why we treat it as table stakes for GEO, not a technical afterthought buried in an engineering backlog.

At MaximusLabs, we start every engagement with one blunt question, can the machine even see your page, because being the answer starts with being readable. This article walks through how to test that, fix it, and move from merely readable to actually cited. If you want to see where you stand today, our AI crawlability checker is the fastest first step.

Q2. What Is an AI Crawlability Checker, and Why Does JavaScript Break It? [toc=2. What a Checker Does]

An AI crawlability checker tests whether crawlers like GPTBot, ClaudeBot, and PerplexityBot can actually read your page. It matters because most of these bots fetch your raw HTML but never run JavaScript. If your content appears only after scripts execute, the checker shows a near-empty page, and AI engines cannot cite what they cannot see, no matter how strong the writing is.

Most bots stop at the raw HTML

🧩 Raw HTML Versus the Rendered Page

Think of your page in two states. The first is raw HTML, the file the server sends before anything runs. The second is the rendered DOM (Document Object Model), the finished page a human sees after JavaScript executes in the browser.

Humans always see the rendered version. Most AI crawlers stop at the raw file. Ethan Smith's technical guidance is direct: keep your most important content in plain HTML, because "AI systems primarily read your raw HTML," and content that appears only after interaction can be missed entirely. This is the core of any serious technical SEO and website audit.

📊 The Scale of the Blind Spot

This is not a fringe edge case. Vercel and MERJ measured hundreds of millions of GPTBot and ClaudeBot requests per month and found these crawlers fetch HTML without rendering JavaScript. Multiply that across every product page, comparison, and FAQ that loads client-side, and the invisible surface area gets large fast.

A checker exists to make that gap visible before it costs you. It compares what is in your raw HTML against what a human sees, then flags the difference.

✅ Crawlable Is the Floor, Not the Finish

Here is the nuance most tool pages skip. Passing a crawlability check does not mean you get cited. It means you are eligible to be considered. Smith's own framing holds: strong SEO fundamentals, including clean crawlability, are "the floor," and the real work is building on top of that with answer engine optimization.

We say the same thing to clients, sometimes to their surprise. When we audit a site at MaximusLabs, step one is disabling JavaScript and viewing the raw HTML, because that is roughly what ChatGPT and Claude actually read. If the body copy is not there, nothing else in the GEO plan matters yet.

Q3. Which AI Crawlers Can Reach and Read Your Site? [toc=3. Who Can See You]

Two gates decide whether an AI crawler sees you: access and rendering. Your robots.txt must allow GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot, or you are blocked at the door. Even when allowed, most do not render JavaScript. Vercel found GPTBot, ClaudeBot, and PerplexityBot fetch HTML but skip JavaScript, while Google's stack renders it, so client-side content stays invisible to ChatGPT and Claude.

Two gates: access, then rendering

🚪 Gate One: Can the Bot Reach You?

Access is the first filter, and it is easy to fail by accident. Some teams block "unknown" crawlers to save server resources. Smith's warning is sharp: if you do not get indexed, "you're not even in the game," so turning on OAI-SearchBot and GPTBot access is a prerequisite, not an option.

Bot names also matter more than people assume. GPTBot, OAI-SearchBot, and ChatGPT-User are distinct agents doing different jobs, and treating them as one is a common mistake. Getting this right is a core part of ChatGPT optimization.

🔍 Gate Two: Can the Bot Read You?

Access without rendering is a half-win. The table below combines both gates using the Vercel and MERJ findings.

AI Crawler Access and Rendering by Platform
Crawler	Used by	Renders JavaScript?
GPTBot	ChatGPT	No
OAI-SearchBot	ChatGPT search	No
ClaudeBot	Claude	No
PerplexityBot	Perplexity	No
Googlebot	Google, Gemini, AI Overviews	Yes
AppleBot	Apple Intelligence	Yes

⚠️ The Google "We Render JavaScript" Half-Truth

Google's stack does render JavaScript, which feeds Gemini and AI Overviews. But even Google hedges. As Smith notes, Google "has repeatedly said for years that it crawls JavaScript," while also wink-nodding that it may not do so as efficiently as it would like. This is why Google AI and Gemini optimization still cannot be your only bet.

The practical read is this. ChatGPT and Claude are effectively blind to client-side content today. Google gives you partial sight, on a delay. Betting your visibility on the one engine that renders, while ignoring the ones that do not, leaves most of the AI-search channel on the table. A Perplexity optimization plan has to account for that gap directly.

Q4. How Do I Check What an AI Crawler Actually Sees on My Page? [toc=4. How to Test]

To check what an AI crawler sees, disable JavaScript in your browser and reload the page. Whatever remains is roughly what GPTBot or ClaudeBot reads. Alternatively, view the raw page source and confirm your body copy is present, or run a site:yoururl.com search using a text snippet from the bottom of your page. If Google cannot find that snippet, your JavaScript is hiding content from crawlers.

Test it the way a bot sees it

🛠️ The Fastest Test: Disable JavaScript

You do not need a paid tool to start. Open your browser settings, disable JavaScript, and reload a key page. What stays on screen is close to what a non-rendering crawler ingests.

Websites work the same way for crawlers. A hub-only structure, where deep pages are reachable only through the homepage, leaves those pages orphaned and undiscovered. Smith's fix is strong internal cross-linking so bots can actually find your content. Point-to-point linking between related pages keeps nothing stranded, and it is a standard fix in our B2B SEO service.

🗄️ The Filing Cabinet Problem With Subdomains

Now picture two filing cabinets. An assistant told to search one will not open the other, because it looks like a separate system. That is roughly how discovery treats a subdomain like help.yoursite.com versus a subdirectory like yoursite.com/help.

So the real work is trust-first and revenue-focused: clear answer-first structure, demonstrated expertise, primary sources, and consistent mentions where buyers and models look. Fix rendering to qualify. Build trust to win. When you are ready to move from testing to fixing, talk to us about where your money pages actually stand.

Q5. Why Does JavaScript-Invisible Content Cost You Pipeline, Not Just Rankings? [toc=5. The Revenue Cost]

JavaScript-invisible content costs pipeline because AI search is close to binary: you are cited or you don't exist. When a buyer asks ChatGPT for the best tool in your category, only five to ten brands get named. If your content is invisible to the crawler, you are not a low ranking. You are absent from the shortlist where the deal actually gets decided.

Invisible pages fall out of the shortlist

💸 The Consideration Set Is the New Battlefield

The old model gave buyers hundreds of Google results to sift through. The AI model hands them a curated list of ten. That list becomes the sample set, and if you are not on it, you are not in the buying conversation at all.

This is why we frame crawlability as a pipeline issue, not an IT ticket. An invisible page does not just lose rank. It removes you from the exact moment a buyer is choosing who to evaluate, which is precisely what our GEO service is built to protect.

📈 The Traffic Is Moving, and It Converts Harder

The shift is not hypothetical. Gartner projects that over 50% of search traffic will move from traditional engines to AI-native platforms by 2028. The traffic that flows through AI answers also behaves differently once it lands.

Webflow saw a 6x higher conversion rate from LLM traffic than from Google search traffic. It now draws 8% of its signups from LLMs, making it one of its top channels. Being invisible to ChatGPT and Claude means forfeiting your highest-intent buyers, not just some pageviews, which is why we tie every engagement to GEO ROI and revenue attribution.

⚠️ Not Every Page Needs the Fix

Here is the contrarian part. You do not need to server-render your entire site to protect revenue. In Ethan Smith's data, roughly one in twenty landing pages drives about 85% of all traffic, meaning nineteen of twenty pages drive little to none.

That inverts the panic. The job is not "fix everything." It is "find the money pages and make those readable first." Spending an engineering sprint rendering low-traffic pages is cash you could put toward pages that actually convert.

✅ Reframing the Spend

Crawlability work, viewed this way, is pipeline protection with a clear priority order. Fix the pages where buyers decide, verify they exist in raw HTML, and leave the long tail for later. You can pressure-test your own pages with our AI crawlability checker before committing engineering time.

This is why our work at MaximusLabs starts with BOFU money pages. We make the roughly 5% of pages that drive most of the pipeline readable and citable first, not the whole site at once. That prioritization sits at the core of our B2B SEO service.

Q6. Which JavaScript Frameworks and Rendering Patterns Cause the Most AI-Invisibility? [toc=6. Frameworks That Break]

Pure client-side rendering (CSR) in single-page apps built with React, Vue, or Angular causes the most AI-invisibility. The initial HTML is nearly empty until JavaScript runs. Hybrid setups are also risky: the body may render server-side while reviews, pricing, or FAQs load asynchronously and stay invisible. The fix is server-side rendering (SSR), static generation (SSG), or incremental static regeneration (ISR) for your most important content.

Where the rendering quietly breaks

🧩 Client-Side Rendering Is the Top Offender

A single-page app (SPA) is a site that loads one HTML shell, then builds every page with JavaScript in the browser. Humans see a full page. A non-rendering crawler sees the near-empty shell.

That gap is the core problem. Ethan Smith's technical guidance is blunt: keep pages crawlable and "avoid weird JavaScript," because bots need content in the HTML, not assembled after the fact. Sorting this out is standard work in a technical SEO and website audit.

⚠️ The Hybrid-Rendering Trap

Partial rendering is sneakier than full CSR. Your main copy may render server-side and look perfectly fine, while the high-value blocks load asynchronously and vanish for crawlers.

Smith showed exactly this in a live check, where reviews "are loaded in asynchronously and they're not seen," leaving "half the page" effectively unindexable. Reviews, pricing, and FAQ content are often the very things buyers ask AI engines about, so losing them is expensive, and it directly undermines answer engine optimization.

🔎 Why This Undercuts AI Retrieval Specifically

There is a second-order cost beyond simple visibility. AI answers rely on retrieval-augmented generation (RAG), where the engine searches, pulls source text, and summarizes it. If your key facts only exist in the rendered DOM, they never enter that retrieval pool.

So a hybrid page can pass a casual glance yet still fail where it matters. The comparison table a buyer needs, the integration detail, the price, none of it reaches the model, which is why technical GEO implementation starts at the rendering layer.

✅ The SSR, SSG, and ISR Decision Guide

The fix is to render important content before it reaches the browser. Match the method to the page:

SSR (server-side rendering): the server builds the full HTML per request. Best for pages that change often, like dashboards or personalized pricing.
SSG (static site generation): pages are pre-built at deploy time. Best for stable content, like comparison and BOFU pages.
ISR (incremental static regeneration): static pages that rebuild on a schedule. A middle path for large catalogs that update periodically.

The practical rule is simple. Whatever a buyer or a crawler must read to choose you should exist in the raw HTML, not appear only after scripts run. If you build on Webflow, our Webflow SEO guide walks through how to keep that content server-rendered.

Q7. Does Site Architecture and Internal Linking Affect AI Crawlability Too? [toc=7. Architecture and Linking]

Yes. Even readable HTML can go undiscovered if your architecture traps crawlers. Sites that force every path through a central homepage hub leave deep pages orphaned. Content split onto subdomains often gets treated as a separate property that AI agents will not automatically check. The fix is point-to-point internal linking and keeping key content, like your help center, in a subdirectory rather than a subdomain.

Readable still needs to be reachable

✈️ The Airline Route Map Problem

Picture two airlines. One flies point-to-point, so any city connects directly to many others. The other routes everything through a single hub, so if you cannot reach the hub, you cannot reach anywhere.

🗄️ The Filing Cabinet Problem With Subdomains

Smith is direct on this: move your help center to a subdirectory, because "subdirectories perform better" than subdomains for visibility. Consolidating content into one property, then cross-linking it, gives crawlers a single, connected map to follow. This matters most for help centers, which answer the exact long-tail questions buyers ask AI engines, and it feeds directly into B2B SaaS AEO strategies.

Our hub-and-spoke architecture at MaximusLabs is built for this. Every spoke page sits one internal link from its hub, so no page gets orphaned, and the structure reads as one connected property rather than scattered filing cabinets. If you want a second set of eyes on your structure, you can talk to us.

Q8. Will llms.txt, Schema, or Page Speed Fix My AI Crawlability? [toc=8. Silver-Bullet Myths]

No single file fixes AI crawlability. llms.txt is an optional hint that cannot make JavaScript-rendered content readable. Schema helps machines parse context but does not rescue content that a crawler never reads. Page speed shows little measurable impact on AI citations. The real fix stays boring: put your important content in raw HTML where crawlers can actually see it.

There is no silver-bullet file

🔍 Why Everyone Wants a Silver Bullet

Rendering fixes require engineering time, so teams reach for a file they can add themselves. It feels productive. It rarely moves the needle on visibility.

The most-hyped candidate is llms.txt, a proposed file meant to guide language models to your key content. On the technical impact of these files, Smith is skeptical: as far as he knows, no major LLM company is actually using llms.txt. A hint file cannot make invisible content readable if the crawler never executes the JavaScript that content depends on. If you still want to experiment, our llms.txt generator makes it a five-minute test rather than a project.

⚠️ Robots.txt and Schema, Handled Honestly

Two related tools get overrated in different directions. Robots.txt can block LLMs from training on your data, but it will not increase your traffic. It controls permission, not readability.

Schema markup (structured data that labels your content for machines) is more useful, though not a fix for invisibility. Smith rates it as genuinely impactful for reviews, products, and locations. But schema describes content that already exists in the HTML. If the underlying content is client-side and unread, marking it up changes nothing, a point we cover in schema markup basics.

💡 The Boring Fix Is the Real Fix

Strip away the hype and the pattern is consistent. Crawlability, strong internal linking, and clean schema on content that is actually present are the levers that work. There is no tag, code, or trick that substitutes for readable HTML, which is why we treat it as the foundation of generative engine optimization.

We test these claims before recommending them, and at MaximusLabs the pattern holds. HTML visibility plus real trust signals beat file-based silver bullets every time. The unglamorous move, putting your content where crawlers can read it, is the one that decides whether you get cited.

Q9. If My Content Is Already Readable, How Do I Actually Get Cited? [toc=9. From Readable to Cited]

Being readable only makes you eligible. To get cited, you need trust-first, revenue-focused content: a clear answer-first structure, demonstrated expertise, primary sources, and consistent mentions where buyers and models look. Fix rendering to qualify. Build trust to win.

Readable is the floor, trust is the win

🛠️ The Fastest Test: Disable JavaScript

You do not need a paid tool to start. Open your browser settings, disable JavaScript, and reload a key page. What stays on screen is close to what a non-rendering crawler ingests.

But readable is a floor, not a finish line. Ethan Smith is direct that clean crawlability is "the floor," and the actual contest happens on top of it. Being present in the retrieval pool is necessary, but it does not decide who gets quoted, which is exactly where answer engine optimization takes over.

⚠️ Why Readable Pages Still Get Ignored

So the real work is trust-first and revenue-focused: clear answer-first structure, demonstrated expertise, primary sources, and consistent mentions where buyers and models look. Fix rendering to qualify. Build trust to win. That trust layer is the heart of our GEO service, and it draws directly on our trust-first content playbook.

This is the line most agencies stop at. They fix the crawl errors and leave. MaximusLabs treats crawlability as step one of a trust-first system built to make you the answer AI engines cite, across ChatGPT, Perplexity, Gemini, and Claude.

Here is the question we are sitting with: as more brands fix rendering over the next two years, readability stops being an edge and trust becomes the only real moat. If everyone is crawlable, what will your brand be cited for?

Q10. Who Fixes AI Crawlability When Engineering Says It'll Take Nine Months? [toc=10. Who Ships the Fix]

Crawlability fixes usually stall not because they are hard, but because marketing cannot command engineering time. High-impact fixes, like server-rendering your money pages, can ship in days, yet get quoted at nine months in the backlog. The unlock is prioritization plus implementation: fix only the roughly 5% of pages that drive most pipeline first, and pair marketing with a team that ships the fix rather than just advising on it.

The blocker is the queue, not the code

⏰ The Real Blocker Is Organizational, Not Technical

Walk into most companies and the crawlability problem is understood. It just sits in a queue nobody in marketing controls. The person who sees the invisibility is rarely the person who can deploy the fix.

Ethan Smith describes this gap plainly: much of this work "could be built in weeks or days," but the engineering team quotes something like nine months against competing priorities. The fix is small. The line to get it done is long. Untangling that is a core part of a technical SEO and website audit.

💸 The Nine-Month Quote Kills Momentum

Smith's own answer to the bottleneck was to build a team that ships fast, precisely because waiting on engineering was the constraint. Early movers also compound: it is often faster to get cited in AI answers than to rank in Google, so speed to visibility is a real advantage. That speed advantage sits at the center of our B2B SEO service.

The priority order matters as much as the speed. Fix the roughly 5% of pages that drive most of your pipeline first, verify they exist in raw HTML, and leave the long tail for later. You can pinpoint which pages fail using our AI crawlability checker before anyone opens a ticket.

This is exactly how we work at MaximusLabs. We do not just hand engineering a list and walk away. We prioritize the money pages, ship the rendering fix ourselves, and build the trust layer on top, which is why technical GEO implementation and content sit under one roof. If your fix is stuck in a nine-month queue, talk to us about shipping it in days instead.

Krishna Kanth

I’m KK >> Over the years, I’ve experimented and built systems that drive growth through AEO & GEO. Today, I help brands turn AI search into revenue engines, not vanity metrics - delivering AI visibility and getting brands cited and chosen across ChatGPT, Perplexity & Google, where real buying decisions happen.
Let’s talk.

‍

Book a 15 min Chat