The Swarm and the Sage
Date: 2026-04-08
Author: John Brennan
Source: https://johnbrennan.xyz/essay/the-swarm-and-the-sage
AI agent collectives and human Superforecasters represent the leading edge of structured probabilistic forecasting. This essay compares their architecture, capabilities, and limitations — and examines what swarm simulation means for the Forecast and Alert stages of the Strategic Foresight and Warning cycle.
---
AI Agent Collectives vs. Superforecasters — and What It Means for Strategic Warning

TL;DR

Two methodologies — Philip Tetlock's Superforecaster collectives and AI swarm systems like MiroFish — represent the leading edge of structured probabilistic forecasting. Both are aggregation systems; both take diversity seriously as a design principle. But they diverge sharply on speed, transparency, calibration history, and manipulation surface. For the Strategic Foresight and Warning cycle, swarm simulation transforms the Forecast and Alert stages: removing the analyst-capacity bottleneck, enabling second and third-order effect modeling, and making alert thresholds dynamically recalibrated rather than static. The optimal architecture integrates both — Superforecaster-style aggregation for point estimates, swarm simulation for scenario mapping — with human-in-the-loop review throughout.

Key Takeaways

Superforecasters beat professional intelligence analysts with classified access by roughly thirty percent on calibration — not by having better data, but better thinking.

MiroFish constructs synthetic populations of 500–2,000 AI agents with persistent memory and distinct personas, then analyzes emergent behavior to generate forecasts — a fundamentally different architecture from human aggregation.

Both approaches share core design DNA: aggregation, diversity, and weighted pooling. Their divergences are in transparency, speed, temporal horizon, calibration track record, and adversarial attack surface.

At the Forecast stage, swarm simulation removes analyst-capacity constraints and enables second and third-order effect modeling that linear methods consistently miss.

At the Alert stage, swarm simulation enables continuously recalibrated threshold systems and enriched alert products that explain the agent-level dynamics driving macro-level outcomes.

MiroFish is weeks old with no independent peer-reviewed calibration data. Use it as a hypothesis generator, not yet as a standalone authoritative forecast source.

Adversarial seed data injection is a real and underexplored attack surface for swarm systems — alert architectures must include provenance verification and human review gates.

The Superforecaster Experiment

In 2011, the Intelligence Advanced Research Projects Activity (IARPA) sponsored a tournament asking ordinary citizens — not intelligence professionals, not credentialed experts — to answer geopolitical forecasting questions. The questions were hard: Would North Korea conduct a nuclear test in the next six months? Would the Eurozone lose a member state? The forecasts were scored with rigorous mathematical precision using Brier scores, where zero is perfect and two is the worst possible.

What Tetlock and his colleagues found was startling. A small subset of participants — roughly two percent — consistently outperformed not just the crowd, but professional intelligence analysts with access to classified information. These Superforecasters beat the intelligence community by approximately thirty percent on calibration. They did not have better data. They had better thinking.

The methodology Tetlock identified had four pillars: talent-spotting (finding people with the right cognitive dispositions), training (teaching probabilistic reasoning and bias correction), teaming (putting Superforecasters together so they challenged each other's assumptions), and aggregation (mathematically pooling forecasts, with heavier weighting toward those with stronger track records). The aggregation step was particularly important: the algorithm "extremized" probabilities, pushing high-confidence predictions higher and low-confidence ones lower, extracting signal from the noise of many independent minds.

The deeper insight was about the nature of forecasting skill itself. Superforecasters share a distinctive cognitive style: they decompose problems into their component parts, update their beliefs incrementally as new information arrives, and operate in continuous probability space rather than binary yes/no judgments. Tetlock called them "foxes" — thinkers who know many small things — as opposed to expert "hedgehogs" who know one big thing deeply and tend to over-apply it.

The limitation of the Superforecaster model is a human one: it scales poorly. You cannot manufacture Superforecasters on demand. Identifying them takes time, scoring their track records takes multiple forecast cycles, and even the best human teams are slow. The Good Judgment Project runs on weeks and months. The world, increasingly, does not wait.

The Swarm: MiroFish and Agent-Based Simulation

MiroFish, an open-source project published in early 2026 by developer 666ghj on GitHub, takes a fundamentally different approach. Rather than finding exceptional human minds and aggregating their judgments, MiroFish constructs synthetic populations — thousands of AI agents, each with distinct personalities, backgrounds, and persistent memory — and lets them interact within a simulated social environment. The emergent behaviors of that population are then analyzed to generate forecasts.

The architecture rests on three critical components working in concert.

The first is GraphRAG — Graph-based Retrieval-Augmented Generation. Before any simulation begins, real-world seed data (news articles, policy documents, social media trends, financial data) is ingested and converted into a relational knowledge graph. This is not a simple vector search database. It is a structured map of entities, relationships, and causal pathways. Every agent in the swarm draws from this shared graph when reasoning about the world, giving the simulation grounding in reality rather than pure confabulation.

The second is Zep long-term memory. Each agent maintains its own persistent memory across the simulation's time steps — remembering past interactions, updating its beliefs, and carrying its individual history forward. This is what separates MiroFish from simpler multi-agent systems: agents are not reset between interactions. They accumulate experience, which means they can model the kind of belief drift, social influence, and opinion cascade that characterizes real human populations.

The third is the ReportAgent — a meta-agent that does not participate in the simulation itself but observes it from above, analyzing population-level traces and synthesizing structured prediction reports with visualizations. The ReportAgent is, in effect, the Superforecaster equivalent in this architecture: a final aggregation layer that translates emergent complexity into actionable intelligence.

The simulation pipeline runs in five stages: knowledge graph construction from seed data, agent persona generation and world configuration, parallel autonomous agent interaction, ReportAgent synthesis, and finally an interactive exploration phase where analysts can "interview" individual agents to understand the micro-level dynamics driving macro-level outcomes.

The practical scale is currently 500 to 2,000 agents — not the million sometimes claimed in viral posts — but at that range the system has demonstrated what its creators describe as "scarily accurate" predictions of market sentiment and public opinion dynamics. The underlying research framework, OASIS (Open Agent Social Interaction Simulations), was developed by the CAMEL-AI team and validated for simulating several months of social time with measurable accuracy gains in financial forecasting models.

Compare and Contrast

These two approaches — the human Superforecaster collective and the AI agent swarm — share more architectural DNA than first appears, but diverge sharply in what they can and cannot do.

Both are fundamentally aggregation systems. The Superforecaster tournament pools hundreds of calibrated human minds. MiroFish pools thousands of synthetic agent perspectives. In both cases, the core insight is the same: no single mind — human or artificial — is as reliable as a well-structured collective. The wisdom of crowds, properly managed, beats the expert.

Both take diversity seriously as a design principle. Tetlock's research found that team diversity was more important than individual ability. MiroFish encodes this architecturally: each agent is generated with a distinct personality profile, background, and behavioral disposition. Homogeneous swarms, like homogeneous expert panels, produce overconfident, brittle forecasts.

Both use a form of weighted aggregation. The Good Judgment Project's algorithm up-weights forecasters with strong track records and penalizes overconfidence. MiroFish's ReportAgent performs an analogous function at population scale, identifying statistically significant emergent patterns against the baseline noise of agent interactions.

The divergences, however, are significant.

Transparency vs. legibility. Superforecasters can explain their reasoning in plain language — they can walk you through the reference classes they consulted, the analogies they weighed, the information they updated on. MiroFish's forecasts emerge from the interaction of thousands of agents whose individual decision chains are opaque. The ReportAgent synthesizes patterns, but the micro-level causal pathway from "agent interactions" to "market prediction" is not easily auditable. This matters enormously in high-stakes decision contexts where analysts need to justify their estimates to skeptical decision-makers.

Speed and scalability. This is where MiroFish has a decisive advantage. Running a Superforecaster tournament on a new question takes days to weeks — recruiting the question, gathering baseline predictions, allowing for updating rounds. A MiroFish simulation, once configured, runs in minutes. At the speed at which modern crises evolve — financial contagion, information operations, geopolitical flashpoints — minutes versus weeks is not a minor operational difference. It is a categorical one.

Temporal horizon and adaptability. Superforecasters excel at near-to-medium term geopolitical questions with relatively clear resolution criteria: Will X happen in the next ninety days? MiroFish is designed for scenario exploration across time horizons of months, and is explicitly built to run "what-if" injections — simulating the downstream social effects of a specific policy change, a market shock, or an information operation. This makes it less a point-estimate forecasting tool and more a scenario stress-testing platform.

Ground truth and calibration. The Good Judgment Project has ten years of Brier-scored track records to validate its claims. MiroFish is weeks old. The "scarily accurate" characterization comes primarily from the project's own creators and early enthusiasts on X. Independent replication and rigorous calibration scoring have not yet been published. This is not a reason to dismiss MiroFish — every methodology starts somewhere — but it is a reason to treat its accuracy claims with the same probabilistic humility its agents are designed to embody.

Manipulation surface. Superforecaster teams are hard to manipulate at scale: you would need to compromise the reasoning of many independent, trained individuals simultaneously. A swarm system's forecast quality is highly dependent on the quality of its seed data. Adversarial actors who understand MiroFish's architecture could potentially inject biased source material into the knowledge graph construction phase and systematically skew the resulting simulation. This is not hypothetical — it is precisely the attack surface that AI-on-AI security research is beginning to document.

Implications for Forecast and Alert

The Strategic Foresight and Warning (SF&W) cycle — Foresee, Forecast, Alert, Warn — is a structured process for moving from horizon scanning to actionable decision support. Each stage has a distinct function, and the emergence of AI swarm intelligence has different implications for each. The two stages most immediately transformed are Forecast and Alert.

Forecast

The Forecast stage is where raw signals from scanning and monitoring are converted into structured probabilistic assessments: not merely "this could happen" but "this has a sixty-five percent probability of occurring within this timeframe, under these conditions." It is the most technically demanding stage of the cycle, and historically the most resource-intensive.

MiroFish and systems like it represent a fundamental expansion of what is achievable at the Forecast stage. The traditional bottleneck has been analyst capacity: there are only so many trained forecasters, and their attention is finite. A swarm system removes that constraint almost entirely. Once a simulation environment is configured, it can be re-run against new seed data in near-real time, generating updated probability distributions as conditions evolve.

More importantly, swarm simulation introduces a capability that structured analytic techniques and even Superforecaster teams struggle to produce: second and third-order effect modeling. When an analyst asks "what happens if interest rates rise by two percent?", a traditional forecast answers that question directly. A swarm simulation populates a synthetic society with diverse agents and watches what actually emerges — viral narratives, behavioral cascades, unexpected coalitions, emergent resistance — capturing the non-linear social dynamics that linear models consistently miss.

The Superforecaster method remains superior for point-estimate questions with clear resolution criteria and where analyst accountability matters. The swarm method is superior for open-ended scenario exploration, rapid iteration, and modeling social complexity. The optimal Forecast stage architecture of the near future will almost certainly integrate both: Superforecaster-style aggregation for calibrated probability estimates on discrete questions, and swarm simulation for mapping the possibility space around those estimates.

Practitioners should be alert to MiroFish's current limitations at this stage: the 500–2,000 agent sweet spot, the dependence on Zep memory for coherence in long simulations, and the absence of peer-reviewed calibration data. Use it as a powerful hypothesis generator, not yet as a standalone authoritative forecast source.

Alert

The Alert stage is where a forecast crosses a threshold — where the probability or imminence of an adverse event reaches the point where decision-makers must be notified and given options to act. It is distinct from Warning (which is broader and more urgent) in that it is targeted, specific, and actionable. A well-constructed alert tells a decision-maker: here is what is happening, here is how confident we are, here is the relevant time window, and here are the decision options available.

The Alert stage has historically been constrained by the speed of analysis. Events can evolve faster than analysts can update their assessments, leaving decision-makers receiving alerts about situations that have already changed. This is the core problem that swarm simulation addresses most directly.

Because MiroFish can re-simulate against new seed data in minutes, it creates the possibility of a continuously updated alert threshold system — a dashboard in which alert conditions are not static trigger points but dynamically recalibrated as the simulation population responds to incoming real-world data. An alert would not just say "the probability of X has crossed seventy percent." It would say "the simulated population is exhibiting early-stage cascade behavior consistent with past X events, with current trajectory suggesting threshold crossing in fourteen to twenty-one days." That is a qualitatively different and more useful alert product.

There is also an underexplored opportunity at the Alert stage in the swarm's interactive exploration capability. After a simulation run, analysts can query individual agents directly — essentially conducting structured interviews with synthetic members of the affected population. This means alert products can be enriched with the "why behind the what": not just a probability estimate, but a granular account of the agent-level dynamics driving the outcome. That explanatory depth is precisely what decision-makers need to select among competing response options.

The risks at the Alert stage are also real and must be managed. If swarm outputs are integrated directly into automated alert thresholds without human-in-the-loop review, adversarial seed data injection becomes a vector for false alerts — potentially more dangerous than missed ones. Alert architectures incorporating swarm simulation must include provenance verification for all seed data, human analyst review gates before alert issuance, and regular calibration audits comparing swarm predictions against real-world outcomes.

Conclusion

The Superforecaster experiment proved that the aggregation of calibrated, diverse human minds can outperform experts with privileged access to classified information. It was a quiet revolution in epistemic humility — a demonstration that how you think matters more than what you know.

MiroFish and its successors are attempting something more radical: replacing the human minds in that aggregation with synthetic ones, and running the process at machine speed, at machine scale, across synthetic societies that can be interrogated, stress-tested, and re-run with new assumptions in minutes. If the calibration results hold up to independent scrutiny — and that remains a significant if — the implications for the Forecast and Alert stages of strategic warning are profound. The bottleneck of analyst capacity disappears. The temporal horizon of actionable forecasting compresses. The richness of scenario exploration expands.

What does not change is the fundamental challenge of the SF&W cycle: the last mile. Forecasts and alerts are only as valuable as the decision-maker's willingness and ability to act on them. No swarm, however sophisticated, can solve the organizational, political, and psychological barriers that cause warning failures. That remains, stubbornly and irreducibly, a human problem.

The sage and the swarm are not competitors. They are, at their best, partners — each compensating for what the other lacks. The future of strategic foresight will belong to practitioners who understand how to deploy both.

Sources

The Good Judgment Project — Wikipedia

The Science of Superforecasting — Good Judgment

MiroFish — GitHub (666ghj)

MiroFish Technical Deep-Dive — Efficient Coder

Intelligence, Strategic Foresight and Warning — Red Team Analysis Society

Methodology of SF&W — Red Team Analysis Society

OASIS Paper (arXiv)

AI-Powered Strategic Warnings — SCSP
---
Canonical: https://johnbrennan.xyz/essay/the-swarm-and-the-sage