Can Claude Read Your Website? A Live Experiment in AI Legibility
Date: 2026-03-09
Author: John Brennan
Source: https://johnbrennan.xyz/essay/can-claude-read-your-website
A live case study in which Claude Opus 4.6 attempted to read three websites — johnbrennan.xyz, agentweekly.ai, and aitoonup.com — revealing which design patterns make content visible to AI agents and which leave sites completely dark.
---
A live case study in which Claude Opus 4.6 attempted to read three websites — johnbrennan.xyz, agentweekly.ai, and aitoonup.com — revealing which design patterns make content visible to AI agents and which leave sites completely dark.
TL;DR
We conducted a live experiment asking Claude Opus 4.6 to discover and read content across three websites built as React single-page applications with Express backends. At the start of the session, all three sites were effectively invisible — Claude received empty HTML shells with no article content, no navigation, and no discoverable paths to any content. Over several hours of iterative testing, debugging, and deployment, we identified which artifacts make a site legible to AI agents and which failures leave it dark. The single most impactful change was a plain-text sitemap (`sitemap.txt`) — one file, one URL per line, that transformed a completely opaque site into one Claude could navigate autonomously. The experiment also revealed that server-side HTML injection, structured Markdown endpoints, `llms.txt` directories, homepage discovery links, and correct MIME types each play distinct and complementary roles in AI legibility. A final test of the Unified TOON Meta-Index (`utmi.toon`) demonstrated that consolidating crawl rules, site index, AI summaries, and API tool registration into a single token-optimized file is viable and immediately useful to an AI agent — provided the file is served with a text MIME type rather than the default binary content type that web servers assign to unknown file extensions.
Key Takeaways
React single-page applications are invisible to AI agents by default. Claude's fetch tools do not execute JavaScript, so any content rendered client-side does not exist from the agent's perspective.
A plain-text sitemap (sitemap.txt) was the single most impactful artifact. Once provided, Claude could autonomously discover and read every piece of content on a site.
Server-side HTML injection works — but edge caching can mask it entirely. A working injection pipeline appeared broken for over an hour because stale cached responses were being served.
Markdown endpoints (.md) are the ideal content format for AI agents. Structured front matter, clean hierarchy, and explicit metadata allow an LLM to parse, cite, and reason about content with zero friction.
Homepage discovery is the critical gap. If the homepage returns nothing navigable, an AI agent has no starting point — even if every other endpoint works perfectly.
MIME types for novel file formats must be explicitly configured. A .toon file served as application/octet-stream is unreadable binary to an AI agent, regardless of how well-designed the format is.
The UTMI format (utmi.toon) consolidates robots.txt, sitemaps, llms.txt, metadata, and API tool registration into a single file that Claude could parse immediately once the MIME type was corrected — demonstrating that unified site manifests are viable and useful for AI agents.
Definitions
AI legibility: The degree to which a website's content is discoverable, accessible, and parseable by AI agents and large language models without requiring JavaScript execution.
SPA shell: The minimal HTML document (index.html) served by a single-page application before JavaScript renders the actual content. Typically contains only a
element and a page title.
Server-side injection: The practice of inserting static content (article text, metadata, structured data) into the HTML response on the server before it reaches the client, so that non-JavaScript clients receive full content.
Cold-start discovery: An AI agent's ability to find content on a site it has never visited before, starting from only the domain URL.
UTMI (Unified TOON Meta-Index): A token-optimized file (utmi.toon) that consolidates crawl control, site index, AI grounding summaries, API tool registration, and metadata into a single machine-readable manifest.
The Starting Point: Three Dark Sites
The experiment began with a simple request: read the articles on johnbrennan.xyz. Claude fetched the homepage and received this:
Change Log — John Brennan
Nothing else. No navigation, no links, no article text, no metadata. The same was true for every essay URL — /essay/building-the-cognitive-factory returned only the SPA title. The site was, from Claude's perspective, empty.
A Google search for site:johnbrennan.xyz returned zero results. The site had no indexed pages. Claude tried llms.txt, robots.txt, and sitemap.xml — all were blocked by Claude's fetch tool, which only allows access to URLs that have been provided by the user or discovered through prior results. Since the homepage contained nothing, there were no URLs to follow.
The same was true for aitoonup.com — a site specifically about making websites discoverable to AI was itself invisible to an AI agent.
The sites existed. The content was there. But from the perspective of an AI agent, they were dark.
Phase 1: The Markdown Breakthrough
The first breakthrough came when the site owner provided a direct URL to a Markdown endpoint: johnbrennan.xyz/essays/building-the-cognitive-factory.md.
Claude fetched it and received the full article — clean Markdown with structured front matter:
# Building the Cognitive Factory
**Date:** 2026-03-08
**Author:** John Brennan
**Source:** https://johnbrennan.xyz/essay/building-the-cognitive-factory
> The firm is adding a second class of worker...
The article included a TL;DR, Key Takeaways, Definitions, a clear heading hierarchy, and the canonical URL at the bottom. From Claude's perspective, this was the ideal input format — every piece of metadata an LLM needs to parse, cite, and reason about content was explicitly present.
But there was a problem: Claude could not have found this URL on its own. The Markdown endpoint worked perfectly once a human provided the link. Without that human, the content remained undiscoverable.
Phase 2: The Sitemap That Unlocked Everything
The second breakthrough was sitemap.txt — a plain-text file containing one URL per line:
https://johnbrennan.xyz
https://johnbrennan.xyz/essay/veridread-and-lucent-surrender
https://johnbrennan.xyz/essays/veridread-and-lucent-surrender.md
https://johnbrennan.xyz/essay/building-the-cognitive-factory
https://johnbrennan.xyz/essays/building-the-cognitive-factory.md
...
The moment Claude received this single URL, the entire site opened up. Claude could see every essay, every Markdown endpoint, and every infographic. It autonomously fetched and read eight essays and three infographics, understanding the full intellectual scope of the site — from a 2011 CIA Studies in Intelligence paper on outlier analysis to a 2026 essay on organizational redesign for the agentic era.
The contrast was stark. Before sitemap.txt: the site was completely dark. After sitemap.txt: Claude could navigate the entire site autonomously.
No other single artifact had this much impact. The XML sitemap (sitemap.xml) existed but was initially served as application/xml, which Claude's fetch tool treated as binary data. The plain-text version, requiring no parsing and no special content type, worked immediately.
Phase 3: The Caching Trap
A parallel investigation examined why the HTML endpoints — the canonical essay URLs — returned empty shells. The design called for server-side injection: the Express server was supposed to intercept essay routes and inject a hidden block,