Can Claude Read Your Website? A Live Experiment in AI Legibility Date: 2026-03-09 Author: John Brennan Source: https://johnbrennan.xyz/essay/can-claude-read-your-website A live case study in which Claude Opus 4.6 attempted to read three websites — johnbrennan.xyz, agentweekly.ai, and aitoonup.com — revealing which design patterns make content visible to AI agents and which leave sites completely dark. --- A live case study in which Claude Opus 4.6 attempted to read three websites — johnbrennan.xyz, agentweekly.ai, and aitoonup.com — revealing which design patterns make content visible to AI agents and which leave sites completely dark. TL;DR We conducted a live experiment asking Claude Opus 4.6 to discover and read content across three websites built as React single-page applications with Express backends. At the start of the session, all three sites were effectively invisible — Claude received empty HTML shells with no article content, no navigation, and no discoverable paths to any content. Over several hours of iterative testing, debugging, and deployment, we identified which artifacts make a site legible to AI agents and which failures leave it dark. The single most impactful change was a plain-text sitemap (`sitemap.txt`) — one file, one URL per line, that transformed a completely opaque site into one Claude could navigate autonomously. The experiment also revealed that server-side HTML injection, structured Markdown endpoints, `llms.txt` directories, homepage discovery links, and correct MIME types each play distinct and complementary roles in AI legibility. A final test of the Unified TOON Meta-Index (`utmi.toon`) demonstrated that consolidating crawl rules, site index, AI summaries, and API tool registration into a single token-optimized file is viable and immediately useful to an AI agent — provided the file is served with a text MIME type rather than the default binary content type that web servers assign to unknown file extensions. Key Takeaways React single-page applications are invisible to AI agents by default. Claude's fetch tools do not execute JavaScript, so any content rendered client-side does not exist from the agent's perspective. A plain-text sitemap (sitemap.txt) was the single most impactful artifact. Once provided, Claude could autonomously discover and read every piece of content on a site. Server-side HTML injection works — but edge caching can mask it entirely. A working injection pipeline appeared broken for over an hour because stale cached responses were being served. Markdown endpoints (.md) are the ideal content format for AI agents. Structured front matter, clean hierarchy, and explicit metadata allow an LLM to parse, cite, and reason about content with zero friction. Homepage discovery is the critical gap. If the homepage returns nothing navigable, an AI agent has no starting point — even if every other endpoint works perfectly. MIME types for novel file formats must be explicitly configured. A .toon file served as application/octet-stream is unreadable binary to an AI agent, regardless of how well-designed the format is. The UTMI format (utmi.toon) consolidates robots.txt, sitemaps, llms.txt, metadata, and API tool registration into a single file that Claude could parse immediately once the MIME type was corrected — demonstrating that unified site manifests are viable and useful for AI agents. Definitions AI legibility: The degree to which a website's content is discoverable, accessible, and parseable by AI agents and large language models without requiring JavaScript execution. SPA shell: The minimal HTML document (index.html) served by a single-page application before JavaScript renders the actual content. Typically contains only a

element and a page title. Server-side injection: The practice of inserting static content (article text, metadata, structured data) into the HTML response on the server before it reaches the client, so that non-JavaScript clients receive full content. Cold-start discovery: An AI agent's ability to find content on a site it has never visited before, starting from only the domain URL. UTMI (Unified TOON Meta-Index): A token-optimized file (utmi.toon) that consolidates crawl control, site index, AI grounding summaries, API tool registration, and metadata into a single machine-readable manifest. The Starting Point: Three Dark Sites The experiment began with a simple request: read the articles on johnbrennan.xyz. Claude fetched the homepage and received this: Change Log — John Brennan Nothing else. No navigation, no links, no article text, no metadata. The same was true for every essay URL — /essay/building-the-cognitive-factory returned only the SPA title. The site was, from Claude's perspective, empty. A Google search for site:johnbrennan.xyz returned zero results. The site had no indexed pages. Claude tried llms.txt, robots.txt, and sitemap.xml — all were blocked by Claude's fetch tool, which only allows access to URLs that have been provided by the user or discovered through prior results. Since the homepage contained nothing, there were no URLs to follow. The same was true for aitoonup.com — a site specifically about making websites discoverable to AI was itself invisible to an AI agent. The sites existed. The content was there. But from the perspective of an AI agent, they were dark. Phase 1: The Markdown Breakthrough The first breakthrough came when the site owner provided a direct URL to a Markdown endpoint: johnbrennan.xyz/essays/building-the-cognitive-factory.md. Claude fetched it and received the full article — clean Markdown with structured front matter: # Building the Cognitive Factory **Date:** 2026-03-08 **Author:** John Brennan **Source:** https://johnbrennan.xyz/essay/building-the-cognitive-factory > The firm is adding a second class of worker... The article included a TL;DR, Key Takeaways, Definitions, a clear heading hierarchy, and the canonical URL at the bottom. From Claude's perspective, this was the ideal input format — every piece of metadata an LLM needs to parse, cite, and reason about content was explicitly present. But there was a problem: Claude could not have found this URL on its own. The Markdown endpoint worked perfectly once a human provided the link. Without that human, the content remained undiscoverable. Phase 2: The Sitemap That Unlocked Everything The second breakthrough was sitemap.txt — a plain-text file containing one URL per line: https://johnbrennan.xyz https://johnbrennan.xyz/essay/veridread-and-lucent-surrender https://johnbrennan.xyz/essays/veridread-and-lucent-surrender.md https://johnbrennan.xyz/essay/building-the-cognitive-factory https://johnbrennan.xyz/essays/building-the-cognitive-factory.md ... The moment Claude received this single URL, the entire site opened up. Claude could see every essay, every Markdown endpoint, and every infographic. It autonomously fetched and read eight essays and three infographics, understanding the full intellectual scope of the site — from a 2011 CIA Studies in Intelligence paper on outlier analysis to a 2026 essay on organizational redesign for the agentic era. The contrast was stark. Before sitemap.txt: the site was completely dark. After sitemap.txt: Claude could navigate the entire site autonomously. No other single artifact had this much impact. The XML sitemap (sitemap.xml) existed but was initially served as application/xml, which Claude's fetch tool treated as binary data. The plain-text version, requiring no parsing and no special content type, worked immediately. Phase 3: The Caching Trap A parallel investigation examined why the HTML endpoints — the canonical essay URLs — returned empty shells. The design called for server-side injection: the Express server was supposed to intercept essay routes and inject a hidden

block,

block — invisible to sighted users, fully visible to AI agents — transformed the site from a dead end into a gateway. Phase 5: The Unified Manifest A final test examined aitoonup.com/utmi.toon — a file implementing the Unified TOON Meta-Index specification that the site itself proposes. The initial fetch returned binary data. The server was serving the .toon file with Content-Type: application/octet-stream — the default for unknown file extensions. Claude could see the file existed but could not read a single byte of its contents. A cache-busting request revealed that the server had already been fixed to serve text/plain, but the edge cache was holding the old binary response. Once the fresh response came through, Claude could read the entire file — and what it contained was striking. In a single document, Claude could see: Crawl control rules for 62 named AI agent user agents, including bots Claude had never encountered in a robots.txt before — NovaAct, Wenxiaobai, Xinghuo, Yuanbaobot, Pangubot, and dozens of others. The wildcard catch-all included 38 disallow patterns covering tracking parameters, gated content, and admin paths. A complete site index in tabular array format — 14 URLs with last-modified dates, AI priority scores, and crawlable flags. The headers were declared once ({loc,lastmod,ai_priority,crawlable}:), followed by pure data rows. The token savings over an equivalent XML sitemap were immediately visible. AI grounding summaries for every page — source URLs, optional Markdown URLs, natural-language summaries, and confidence scores. This was effectively llms.txt content, but structured as a tabular array within the unified file. API tool registration — five endpoints (scan_website, generate_utmi, search_directory, get_directory_entry, download_utmi) with names, descriptions, and HTTP methods. An AI agent reading this would know it could POST /api/scan to scan a website for AI readiness, or GET /api/directory to search for TOON-optimized sites. Context metadata — organization information, SEO signals, social signals, and a consistency score. The entire site — its permissions, its content, its summaries, its API capabilities, and its metadata — was consolidated into one file. Claude could parse every section on the first read. The tabular array format was intuitive: field headers declared once, then clean comma-separated data rows. No curly braces, no repeated keys, no XML angle brackets. The UTMI specification proposes replacing the fragmented collection of robots.txt, sitemap.xml, llms.txt, and JSON-LD with a single token-optimized manifest. This live test demonstrated that the concept works in practice — provided the file is served with a text MIME type. The format's value proposition — one fetch to understand an entire site — was real. The adoption barrier — MIME type configuration for an unknown file extension — was equally real, and had to be discovered and fixed during the experiment itself. What Each Artifact Does The experiment revealed that each artifact in the AI legibility stack serves a distinct function. None is redundant. sitemap.txt solves cold-start discovery. It is the single most important file for AI agents because it requires no parsing, no special content type, and no prior knowledge of the site's structure. One URL per line. Any tool can read it. .md endpoints solve content consumption. Markdown with structured front matter is the ideal format for LLM ingestion — explicit metadata, clean hierarchy, no markup noise. This is what the agent actually reads and reasons about. Server-side HTML injection solves the canonical URL problem. The URLs that get shared on social media, linked in articles, and indexed by Google must return content to non-JavaScript clients. Without injection, these URLs are empty for crawlers and agents alike. llms.txt solves intelligent navigation. While sitemap.txt provides a flat list of URLs, llms.txt provides structured context — summaries, topics, and the explicit pairing of Human URLs with LLM URLs. An agent can use it to decide which content is relevant before fetching anything. Homepage injection solves the entry point problem. Most AI agents will arrive at the root domain. If the homepage contains a tag or a hidden nav with content links, the agent can discover everything from the first request. Without it, the site requires a human to provide a starting URL. JSON-LD solves attribution. When Claude encounters a