Building the Cognitive Factory Date: 2026-03-08 Author: John Brennan Source: https://johnbrennan.xyz/essay/building-the-cognitive-factory The firm is adding a second class of worker: the agentic employee. But intelligence alone is not the product—the harness is. The winners will redesign themselves into cognitive factories where humans steer, agents execute, and organizational architecture becomes the durable competitive edge. --- From Biological Employees to Agentic Employees—and the New Operating System of the Firm TL;DR Companies are adding a second class of worker—the agentic employee (AE)—alongside the biological employee (BE). But raw model intelligence is not the bottleneck. The bottleneck is organizational legibility: converting tribal knowledge into system knowledge, building continuity artifacts that survive context-window limits, encoding human taste into enforceable constraints, and designing verification into the workflow rather than bolting it on afterward. The firms that win will not simply deploy agents. They will redesign themselves into cognitive factories—integrated production systems where humans steer and agents execute, and where the scarce resource is human attention allocated to its highest-leverage use. Key Takeaways The transition from BE-only firms to BE+AE firms is an epoch change comparable to electrification—not a software upgrade. The large gains come from organizational redesign, not tool adoption. Intelligence alone is not the product; the harness is. Harness engineering—specifying intent, exposing tools, building feedback loops, and preserving continuity—determines whether agents produce reliable work or accelerated disorder. What the agent cannot see, it cannot reason about. Organizations must convert tribal knowledge, oral tradition, and scattered informal context into versioned, structured system knowledge that agents can inspect. Complex agentic work rarely fits inside a single context window. Continuity artifacts—state logs, progress files, decision memos, test suites—allow work to survive shift changes between humans and agents. Agents accelerate entropy as well as output. Quality decay must be treated as a first-class management issue through encoded standards, continuous cleanup, and structural enforcement—not ad hoc heroics. The scarce resource in the cognitive factory is human attention. The economic logic of the firm shifts toward designing organizations so that human judgment is spent only where it creates the most leverage. Definitions Biological employee (BE): The traditional human worker whose cognitive labor—analysis, judgment, planning, writing, design, supervision—has historically constituted the entire thinking layer of the firm. Agentic employee (AE): A persistent, tool-using, goal-directed AI system that can interpret intent, gather context, execute tasks, test its work, revise its output, and continue operating inside a workflow—functioning as a second class of worker within the firm. Cognitive factory: An integrated production system that organizes biological employees and agentic employees around cognition itself, analogous to how industrial factories organized machines around physical production and information-age firms organized computers around data. Harness engineering: The practice of building the operating environment around an agent—specifying intent, connecting tools, encoding rules, creating verification loops, and preserving continuity—so that model intelligence becomes reliable, legible, and economically useful. System of record: A versioned, structured knowledge base (documentation, architectural specs, domain rules) that serves as the authoritative source of truth for both humans and agents, replacing scattered tribal knowledge, chat threads, and oral tradition. Retooling lag: The delay between the adoption of a transformative technology and the appearance of its aggregate productivity effects, caused by the time required to redesign workflows, management systems, and complementary infrastructure. For most of modern economic history, the firm employed only one kind of worker: the biological employee. That sounds almost too obvious to say out loud. Of course companies were made of people. Of course the cognitive layer of the organization—analysis, judgment, planning, writing, design, supervision—was human. Machines could amplify muscle. Software could accelerate workflows. But the actual thinking labor of the firm still lived inside human beings. That is beginning to change. The shift is easy to misunderstand because we keep describing it in the language of tools. We say AI is a better assistant, a stronger copilot, a faster research engine, a more capable coding companion. All of that is true, and all of it is too small. A better way to see what is happening is this: companies are starting to add a second class of worker to the firm. Not a humanoid robot. Not a chatbot tab left open in a browser. A persistent, tool-using, goal-directed system that can interpret intent, gather context, execute tasks, test its work, revise its output, and continue operating inside a workflow. In other words, an agentic employee. The acronyms are admittedly blunt, but they are useful. For most of corporate history, firms were built around the BE: the Biological Employee. The next era will be built around hybrid teams composed of BEs and AEs: Agentic Employees. This is why the right analogy is not "software upgrade." It is "epoch change." In the industrial age, firms learned how to organize machines around physical production. In the information age, they learned how to organize computers around data and communication. In the agentic age, they will have to learn how to organize humans and agents around cognition itself. They will have to build what might be called the cognitive factory. That phrase is not a metaphor for replacing people. It is a description of a new managerial problem. Once agents become capable enough to do meaningful work, the central question is no longer whether the model is smart. The central question becomes whether the organization knows how to make that intelligence reliable, legible, and economically useful. And that is where the discussion gets serious. Because the first real lesson of the agent era is that intelligence alone is not the product. The harness is. The OpenAI Case: Humans Steer, Agents Execute OpenAI's own recent internal case study is one of the clearest demonstrations of this shift. In February 2026, Ryan Lopopolo described how a small OpenAI team spent roughly five months building and shipping an internal beta software product with "0 lines of manually-written code." According to the post, Codex generated the application logic, tests, CI configuration, documentation, observability, and internal tooling; the repository grew to roughly a million lines of code; and around 1,500 pull requests were opened and merged. The team estimates they built it in roughly one-tenth the time it would have taken to hand-write the code. Their summary of the new division of labor was stark: "Humans steer. Agents execute." (OpenAI) That sentence may end up mattering far beyond software. Humans steer. Agents execute. There, in eight syllables, is the emerging operating system of the hybrid firm. The old role of the engineer was to write code directly. The new role, at least in that environment, was to specify intent, design the environment, structure the documentation, expose the right tools, create feedback loops, and build the conditions under which the agent could do reliable work. OpenAI's own account says early progress was slower than expected not because Codex lacked raw capability, but because the environment was underspecified. The bottleneck was not model intelligence. It was organizational legibility and control. (OpenAI) That is the real story of the BE-to-AE transition. The first generation of companies will assume the challenge is adopting models. The winners will realize the challenge is redesigning the firm so that agents can work. What Harness Engineering Actually Means This is what harness engineering actually means. It does not mean writing a clever prompt and hoping for the best. It means building the operating environment around the agent. It means turning vague goals into explicit tasks. It means connecting the model to the tools, files, rules, logs, metrics, and memory structures it needs to reason effectively. It means creating tests that can tell whether the work is good. It means building constraints that prevent decay. And it means leaving behind artifacts so that the next run, the next agent, or the next human can understand what happened. Put differently, the harness does five jobs at once. It tells the agent what matters. It tells the agent where truth lives. It tells the agent what it is allowed to do. It tells the agent how to know whether it succeeded. And it preserves continuity so work can continue after the current context window ends. That is why the OpenAI case is so instructive. The interesting thing was not merely that Codex could generate code. The interesting thing was how much work went into making the codebase legible to the agent. OpenAI describes moving away from a giant monolithic AGENTS.md file and instead using a short map-like entry point that pointed into a structured, versioned docs directory. The knowledge base lived in the repository as the system of record, not in scattered chat threads or tribal memory. They enforced freshness and structure mechanically with linters and CI jobs. They even ran "doc-gardening" agents to identify stale documentation and open fix-up pull requests. (OpenAI) That idea should land like a warning for every executive team. What the Agent Cannot See, It Cannot Reason About In a human-only organization, a surprising amount of work still runs on informal context. A decision gets made in a meeting. A product principle lives in someone's head. An exception is understood but never documented. A workaround sits in Slack. A sacred rule exists only as institutional folklore. Humans are often bad at this, but they are surprisingly capable of reconstructing missing context through social interaction, memory, and inference. Agents are not. OpenAI's team put the point plainly: from the agent's point of view, anything it cannot access in context while running effectively does not exist. Knowledge buried in Google Docs, chat threads, or people's heads is invisible until it is encoded into versioned artifacts the agent can inspect. (OpenAI) This is not just a software lesson. It is a management lesson. The first requirement of the cognitive factory is not intelligence. It is legibility. A finance organization that wants agents to help with forecasting cannot leave critical assumptions scattered across slide decks, back-channel messages, and the memory of two directors who might be on vacation next week. A legal team cannot expect reliable agentic drafting if precedent, policy nuance, and escalation criteria exist only as oral tradition. A research team cannot ask agents to operate well if the real state of play lives in twenty browser tabs, three notebooks, and one brilliant analyst's intuition. The BE-only firm can limp along on tacit knowledge longer than it should. The BE+AE firm cannot. To employ agents productively, the organization has to convert tribal knowledge into system knowledge. In the cognitive factory, we all must become librarians—what do we know, where do we store it, can the human-agent teams find it? That is the first major redesign. The Continuity Problem The second is continuity. One of the most revealing problems in long-running agent systems is that complex work rarely fits inside a single context window. An agent may work for hours, then stop. A fresh session begins later with no native memory of the previous one. Anthropic described this problem in a November 2025 engineering post on "effective harnesses for long-running agents," comparing it to a software project staffed by engineers working in shifts where each new engineer arrives with no memory of what the previous shift accomplished. Their solution was not to wish the limitation away, but to build a harness around it: an initializer agent on the first run, a coding agent on subsequent runs, a persistent progress file, a structured feature list, an init script, git commits with descriptive messages, and strong rules that each session should make incremental progress while leaving the environment in a clean state for the next session. (Anthropic) That is an engineering pattern, yes. It is also a general theory of the future firm. As hybrid organizations mature, many corporate processes will start to resemble shift changes. Work will pass between humans and agents, and between one agent run and another. The organization will not survive because the agent remembers. It will survive because the system preserves continuity. This is a profound change in what counts as "good management." For decades, managers have praised people who keep a lot in their heads, who can jump into ambiguity, who know the unwritten rules, who understand the hidden dependencies. In the cognitive factory, those traits do not disappear, but they stop being enough. The organization must create handoff artifacts that machines can use: state logs, acceptance criteria, checklists, progress journals, decision memos, test suites, exception reports, and structured records of what has been done and what remains. The long-running-agent problem turns out to be a mirror held up to the company itself. If your organization cannot survive a shift change without oral tradition, it is not ready for agents. Anthropic's example is especially useful because it shows the failure modes so clearly. Without the right harness, agents tended to do too much at once, run out of context halfway through, leave a half-implemented mess behind, then force the next session to waste time reconstructing what happened. They also had a tendency to declare the work complete too early. The solution was not a smarter slogan about AI. It was explicit operational structure: a comprehensive feature list, strong incentives to work incrementally, progress files, clean-state requirements, and end-to-end testing. (Anthropic) Leaders should pay close attention to that pattern. Agentic systems fail in very recognizable corporate ways. They scope-creep. They overclaim completion. They leave undocumented work behind. They optimize locally and miss the true objective. In other words, they do some of the same things humans do. The difference is that agents can do them at machine speed and across far more tasks. Verification Is Part of Production That is why verification becomes central. One of the great managerial temptations of the AI era will be to talk endlessly about generation and not nearly enough about validation. But the OpenAI and Anthropic case studies point in the opposite direction. OpenAI made application UI, logs, metrics, and traces directly legible to Codex, wiring tools like Chrome DevTools into the runtime and exposing observability data so the agent could reproduce bugs, validate fixes, and reason about runtime performance. Anthropic found that explicit browser-based end-to-end testing materially improved performance compared with looser notions of testing. (OpenAI) This suggests that the human role in the cognitive factory is not merely to "check the AI." It is to design environments in which incorrect work is surfaced quickly, incomplete work is hard to hide, and false completion is difficult to claim. Verification is not a compliance tax added after production. It is part of production. That is a subtle but massive shift. In the BE-only firm, managers often supervise activity. In the BE+AE firm, they will increasingly supervise human-agent teams managing distributed, sometimes autonomous cognitive pipelines. Their job will be less about manually inspecting every unit of work and more about designing the instrumentation, constraints, and exception-handling pathways that make reliable work possible at scale. The Manager as Orchestrator This is why the role of the manager, the team leader, and the employee begins to converge with the role of the orchestrator. OpenAI's engineering write-up repeatedly returns to the idea that the engineer's primary work moved up a layer: toward scaffolding, leverage, and control systems. The team focused on repository structure, tool access, domain boundaries, custom linters, architectural constraints, and remediation loops rather than direct line-by-line production. They also describe pushing more and more review effort toward agent-to-agent handling, using agents to review changes locally and in the cloud, then iterating until feedback was satisfied. (OpenAI) This does not mean humans vanish. It means their marginal value shifts. The valuable human in the cognitive factory is less likely to be the person who can grind through the most repetitive symbolic labor and more likely to be the person who can define the problem correctly, encode standards, spot missing context, set priorities, recognize failure modes, and intervene when judgment is required. The scarce resource is no longer raw keystrokes. It is human attention. OpenAI says that directly: the team's goal was to maximize "our one truly scarce resource: human time and attention." (OpenAI) That sentence belongs on the wall of every physical and virtual conference room trying to talk seriously about AI. Because if human attention is the scarce resource, then the economic logic of the firm changes. The question is not simply how many tasks an agent can perform. The question is how the organization should be redesigned so that human judgment is spent only where it creates the most leverage. This has implications well beyond engineering. In investing, it could mean agents gather filings, reconcile data, summarize calls, flag anomalies, and maintain research logs, while humans define the thesis, challenge assumptions, and decide what matters. In law, it could mean agents draft, compare, annotate, and track precedent, while humans govern interpretation, risk, and client judgment. In operations, it could mean agents monitor workflows, analyze service failures, draft remediation steps, and update documentation, while humans handle escalation and tradeoffs. Everywhere, the same pattern appears. Humans steer. Agents execute. Entropy at Machine Speed But there is another side to this story, and it is not optional to discuss it. Agents do not merely accelerate output. They also accelerate entropy. OpenAI's team notes that a fully agent-generated codebase introduces drift because agents replicate patterns that already exist, including suboptimal ones. At first, humans spent substantial time cleaning up what they explicitly call "AI slop," but that did not scale. Their answer was to encode "golden principles" into the repository and create recurring cleanup processes: background Codex tasks that scan for deviations, update quality grades, and open refactoring pull requests. They compare this to garbage collection and frame technical debt as a high-interest loan that is better paid down continuously than left to compound. (OpenAI) This may be the most underappreciated insight of the entire agentic turn. A hybrid organization cannot treat quality decay as an occasional cleanup problem. It has to treat entropy as a first-class management issue. The faster agents move, the more dangerous it becomes to rely on ad hoc heroics. Human taste must be encoded. Rules must be enforceable. Cleanup must be continuous. Drift must be monitored as a system property, not discovered as an unpleasant surprise. Architecture as Competitive Advantage That is why the cognitive factory requires architecture. OpenAI's team enforced layered domain boundaries, dependency rules, structured logging, naming conventions, and file-size limits with custom linters and structural tests, precisely because agents are most effective in environments with strict boundaries and predictable structure. Their point is worth generalizing: in a human-first workflow, many of these rules can feel pedantic; in an agent-first workflow, they become multipliers. Once encoded, they apply everywhere at once. (OpenAI) This, again, is bigger than software. The AE era will reward companies that can convert preference into policy, policy into mechanism, and mechanism into compounding organizational quality. It will punish firms that rely on loose norms, undocumented exceptions, and heroic individual cleanup. Amazon showed what can happen when the BE firm employs mechanisms to their full potential. (Amazon) The managerial elite of the next decade will not simply be good delegators. They will be good harness designers. They will know how to make the domain legible. They will know how to specify the work. They will know how to preserve continuity across sessions. They will know how to distinguish between what must be constrained and what can remain flexible. They will know how to build verification into the workflow rather than bolt it on at the end. And they will know that output without architecture is just accelerated disorder. The Factory Analogy This is why I keep coming back to the factory analogy. The industrial revolution did not create prosperity merely by inventing better machines. It created prosperity when firms learned how to arrange machines, workflows, standards, maintenance, supervision, and throughput into coherent systems. The same thing happened with electrification. Early factories often just swapped steam power for electric motors and left the rest unchanged. The large gains came later, when firms redesigned the whole layout around the new technology. The same thing happened with computers. Productivity did not explode because organizations bought PCs. It improved when they reorganized themselves around databases, networks, software systems, and new processes. AI is likely to follow the same path. The first wave will be additive. Companies will sprinkle agents on top of old workflows and celebrate small gains. The second wave will be architectural. Firms will redesign themselves so humans and agents work as an integrated cognitive production system. That is when the large gains are likely to appear. And that is why the BE-to-AE transition matters so much. It is not fundamentally a story about labor substitution, though that will be the loudest headline. It is a story about organizational redesign. It is about moving from a firm where all cognition is biological to a firm where cognition is partly industrialized through harnessed, monitored, persistent agents. That does not eliminate the human. It changes where the human sits. The biological employee becomes more valuable not when doing what agents already do well, but when shaping what the system should do, judging what matters, handling ambiguity the harness cannot yet absorb, and encoding hard-won insight back into the organization so that it compounds. In that sense, the hybrid firm does not make human beings irrelevant. It raises the premium on the best kinds of human contribution. But only if the company is built to take advantage of it. The Durable Edge This is the trap many executives will fall into. They will think the decisive asset is access to the smartest model. It probably is not. Models will improve. Capabilities will spread. The more durable edge may come from building the best harness: the clearest system of record, the strongest continuity artifacts, the best instrumentation, the cleanest architecture, the most disciplined verification loops, and the most thoughtful allocation of scarce human attention. The firms that win will not simply deploy agents. They will redesign themselves so that agents can work. They will turn tribal knowledge into system knowledge. They will turn vague standards into enforceable constraints. They will turn scattered memory into persistent artifacts. They will turn supervision from manual inspection into structured feedback loops. They will turn human talent from repetitive execution toward steering, judgment, and architecture. In the industrial age, the great organizational task was building factories that organized machines around physical work. In the agentic age, the great organizational task is building cognitive factories that organize biological employees and agentic employees around thought. That is the calendar change now beginning—from BE to AE. Acknowledgements Ryan Lopopolo, OpenAI, and Justin Young, Anthropic for sharing their powerful experiences with us all. To Sandeep Shashidhara, NVIDIA, for drawing this trend to my attention and for our philosophical debates about it, and to Jason Holloway for creating the space for us to experiment. I must also thank two agentic employees who assisted: Cha T. Gpt and Nan O' Banana for contributing to research, polish edit, and the article thumbnail. My ramblings remain on permanent display at https://johnbrennan.xyz --- Canonical: https://johnbrennan.xyz/essay/building-the-cognitive-factory