Agents as Complex Adaptive Systems

For years, I have been reading about complexity science — not as a passing curiosity, but as a unifying lens through which I try to understand systems of all kinds: humans, societies, cities, multicellular organisms, and now, AI agents. Recently, while reading Time's Second Arrow by Robert Hazen and Michael Wong, the argument crystallized. The book proposes what its authors call the law of increasing functional information: that the functional information of a system will increase — that is, the system will evolve — if many different configurations of that system undergo selection for one or more functions. Evolution, in this view, is not limited to biology. It is a universal phenomenon operating across the atomic, chemical, mineral, and biological universe. It is time's second arrow — the force that establishes order where thermodynamics predicts only disorder. From the Big Bang onward, something has been selecting, retaining, and composing configurations that work.

What struck me most was the framework's identification of three fundamental sources of selection that drive this process: static persistence, dynamic persistence, and novelty generation. These concepts originate from the foundational PNAS paper by Wong, Cleland, Hazen, and colleagues (2023), which the book expands upon. They are not biological concepts retrofitted onto physics. They are universal principles that apply to any evolving system — minerals, atmospheres, ecosystems, and, I believe, AI agents.

Complex Is Not Complicated

The distinction matters. A complicated system — an aircraft engine, a tax code — has many parts, but its behavior follows from its design. You can trace cause to effect. A complex system is different. It exhibits emergent behavior, non-linearity, feedback loops, and probabilistic outcomes. It surprises you. A city is complex. A market is complex. A human being navigating an unfamiliar problem is complex.

When we build agents today, we often build complicated systems: deterministic pipelines with retrieval steps, tool calls, and prompt chains. These work, but they are brittle. They solve the problem they were designed for and fracture when the problem shifts. What I am arguing for is something different — designing agents that are complex adaptive systems by intention, not by accident.

The Harness Is the Developmental Substrate

There is a lot of discourse right now about "agent harnesses." The term sounds sophisticated, but at its core, a harness is the set of layers you build around a foundation model to give it memory, learning, adaptation, fallback strategies, and tool access. It is, in less fashionable language, a wrapper. But the framing matters. If you think of a harness as a wrapper, you build it to constrain. If you think of it as a developmental substrate, you build it to enable.

Consider how humans develop. We are born with core components — sensory systems, motor capabilities, a brain architecture shaped by millions of years of evolutionary selection. But we are not born with skills. We acquire them through interaction with the world. We develop mental models, refine strategies, discard what fails, and retain what works. No one is born knowing how to debug a distributed system or cook a biryani. These capabilities emerge from a substrate that permits exploration, feedback, and retention.

An agent harness should do the same thing. It should not encode solutions. It should enable the agent to discover, evaluate, and retain them.

Three Sources of Selection

Hazen and Wong identify three fundamental sources of selection that drive all evolving systems. When I read these, the mapping to AI agents was immediate — and it reframed how I think about what an agent harness needs to provide.

Static Persistence. The most basic function in any evolving system is stability — simply enduring. A mineral persists because its atomic configuration is thermodynamically stable. It does not need to do anything to survive; it just needs to not decay. In agent terms, this is what training produces. Architecture, pre-training, mid-training, post-training — the entire pipeline is itself an evolutionary process. We begin with random initialization (or, more accurately, an informed architectural induction) and raw data. Through iterative selection — which is what loss minimization functionally is — the model arrives at a stable configuration of weights that encodes capabilities. These are the agent's equivalent of mineral stability: they persist without active maintenance. You do not retrain the model every time it encounters a new customer support ticket. The trained weights simply endure.

Dynamic Persistence. More interesting than mere stability is the capacity to actively maintain oneself through ongoing energy flows. A living cell is not just stable — it is homeostatic. It actively processes energy, corrects perturbations, and sustains itself far from equilibrium. Hazen and Wong note that dynamic persistence involves correlations between the system and its environment that promote continued existence. In agent terms, this is the runtime self-maintenance layer — memory management, tool orchestration, self-correction loops, context accumulation. A customer support agent that has processed ten thousand tickets is not merely stable; it is actively maintaining and updating its operational knowledge, calibrating its strategies, writing to memory, pruning what no longer serves it. This layer requires ongoing compute (the agent's energy source) and active feedback between the system and its environment. It is the difference between a rock and a living organism, and it is the difference between a static model and a functioning agent.

Novelty Generation. This is what Hazen and Wong call "third-order selection" — and it is the most profound. Beyond stability, beyond homeostasis, there exist selection pressures that favor systems capable of open-endedly inventing new functions. A system that can explore new portions of its possibility space may discover new configurations that enable entirely new capabilities. The rise of art, literature, technology, and science in human culture — as the original PNAS paper notes — may reflect this inherent pressure to experiment and discover. In agent terms, novelty generation is the capacity to go beyond known tools and strategies when confronting genuine uncertainty. When a problem does not match any existing skill or mental model, the agent must explore. It tries approaches, evaluates outcomes, and sometimes discovers configurations that are genuinely new — new tool compositions, new reasoning strategies, new ways of structuring information. This is where new functional information is born, tested, and either promoted to dynamic persistence or discarded.

What makes this framework powerful is that these are not just descriptive categories — they are sources of selection. They are the pressures that determine which configurations survive and which do not. Static persistence selects for stability. Dynamic persistence selects for active self-maintenance. Novelty generation selects for creativity. An agent system that exhibits all three is one that can endure, self-correct, and invent. That is what it means to evolve.

The Hazen-Wong paper also offers a hierarchy of information processing that maps beautifully onto agent capabilities: memory (storage of information acquired through sensing), memory-based prediction (the ability to infer future states from encoded memory), and prediction outside of memory — imagination, counterfactual reasoning, the ability to consider states that have never been observed. Each level promotes persistence in increasingly sophisticated ways. Memory allows encoding associations. Memory-based prediction enables basic causal understanding. Prediction outside of memory enables genuine novelty generation through imagined versions of reality. If you are designing an agent harness, this hierarchy tells you exactly what cognitive infrastructure to provide — and in what order of priority.

I also see a connection to the nested learning hierarchy from the Titans paper, where learning occurs not just through gradient updates but across forward passes, hyperparameter schedules, and optimizer states, each operating at different frequencies. The three sources of selection operate at analogous temporal scales: static persistence is established across training runs, dynamic persistence across sessions and deployment cycles, and novelty generation within individual problem-solving episodes. Together, they constitute a continuous evolutionary process that extends from pre-training all the way through runtime.

Problem Agnosticism as a Design Principle

If you accept the CAS framing, a consequence follows immediately: agents should be problem-agnostic.

This is counterintuitive if you come from an engineering mindset where you scope requirements, define interfaces, and build to spec. But complex adaptive systems — humans, ecosystems, economies — are not instantiated with a specific problem in mind. A human is not born to solve one problem in one domain. We face a distribution of challenges that shifts over our lifetime, and our adaptability is precisely what lets us survive that shift.

The same should hold for agents. Consider Claude Code. It is not a biomedical agent or a data engineering agent or a frontend agent. It is a problem-agnostic system with access to information, execution capabilities, and feedback channels. When you point it at a new domain — say, protein structure analysis — it bootstraps competence the way a smart generalist would: reading documentation, trying approaches, correcting errors, building up domain-specific knowledge on the fly.

This is not magic. It is what you get when you provide the right substrate and let exploration do its work. The design implication is clear: instead of building problem-specific pipelines, build general-purpose developmental substrates and let the agent self-organize around the problem.

Self-Organization and the Lego Block Principle

Self-organization is one of the defining properties of complex adaptive systems. No one coordinates the individual neurons that produce consciousness. No one directs the individual traders that produce a market price. Structure emerges from local interactions under the right conditions.

For agents, the "right conditions" means providing the essential building blocks without over-constraining how they are assembled. I think of this as the Lego block principle. You do not build the final structure for the agent. You provide blocks — a file system for persistent memory, a tool registry for capability extension, feedback channels for evaluation, a skill store for reusable strategies — and you let the agent compose them in response to the problem at hand.

This requires a specific kind of design discipline. You have to resist the urge to hardcode workflows. You have to trust that given the right blocks, useful structure will emerge. And you have to accept that sometimes it will not, and that failure is itself informative.

But there is a critical caveat. Self-organization is not magic, and it is not unconstrained. An elephant does not develop wings — not because evolution is incapable of producing wings, but because the selection pressures on elephants do not favor them. The same logic applies to agents. You cannot expect an agent to develop capabilities for which it has no substrate. If you do not provide a file system, the agent cannot develop persistent memory. If you do not provide tool access, the agent cannot extend its capabilities. The Lego blocks you choose to provide define the space of possible self-organization. Choose carefully.

Designing the Fitness Landscape

Hazen and Wong's framework describes the sources of selection that exist in nature. But here is where agent design diverges from natural evolution in a crucial way: in natural systems, the selection pressure is given by physics, chemistry, and ecology. In agent systems, the selection pressure is designed. And this is both a tremendous advantage and a subtle danger.

It is an advantage because we can shape the fitness landscape to reward the behaviors we want — accuracy, helpfulness, safety, efficiency. We can design what counts as static persistence (what the model retains from training), what counts as dynamic persistence (what self-correction mechanisms we provide), and what counts as novelty generation (what exploration space we open up). We get to choose the functions for which selection operates.

It is a danger because a poorly designed fitness landscape produces agents that are well-adapted to the wrong thing. Goodhart's Law is, in CAS terms, a statement about misaligned selection pressures. If you select for response length as a proxy for helpfulness, you get verbose agents. If you select for user satisfaction without grounding, you get sycophantic agents. The function you select for determines the direction of evolution.

So when I say agents should self-organize, I do not mean they should be left unsupervised. I mean we should design the selection mechanisms — the feedback loops, the evaluation criteria, the reward signals — with the same care that we design the capability substrate. The agent explores; the selection pressure shapes. Both are necessary. And the Hazen-Wong framework gives us the vocabulary to be precise about what kind of selection we are applying at each level.

A Hierarchical Methodology for Studying Agent Behavior

The CAS framing does not only inform design. It also provides a methodology for studying agent behavior, and this is where I think the framework becomes genuinely powerful.

Consider how we study complex adaptive systems in humans. We do not start by sequencing the genome. We start with behavioral experiments — questionnaires, controlled tasks, observational studies. The Big Five personality traits, for example, are a behavioral parameterization: a compact set of proxy metrics derived from carefully designed questions that capture meaningful variation in human behavior without requiring us to understand the underlying neuroscience.

Only when behavioral characterization reveals something interesting do we go deeper. We move to neuroimaging — fMRI, EEG — to understand which brain regions are involved. We study the interactions between systems: the complementary learning system, the neocortex, the hippocampus. And only in specific cases do we go all the way down to genetics and molecular biology.

The same hierarchical approach should apply to studying agents:

Level 1: Behavioral characterization. Design experiments that probe agent behavior under controlled conditions. How does the agent respond to distribution shift? How efficiently does it explore? How does it allocate memory? What are its failure modes under uncertainty? These are the Big Five for agents — proxy metrics that parameterize complex behavior without requiring us to inspect internal representations.

Level 2: Representation analysis. When behavioral characterization reveals an interesting pattern — say, an agent that adapts quickly to new domains versus one that does not — we examine internal representations. What do the activations look like? Are there interpretable structures? How do representations change over the course of adaptation? This is the neuroimaging layer.

Level 3: Mechanistic interpretability. For the deepest questions — why does a specific capability emerge, why does a specific failure occur — we go to the mechanistic level. Sparse autoencoders, circuit analysis, causal tracing. This is the molecular biology of agents.

The key insight is that you do not start at Level 3. You do not crack open the weights to understand why an agent is bad at customer support. You start with behavioral experiments, form hypotheses, and descend only as deep as necessary. Each level is more expensive and more informative, and the levels inform each other.

Connection to Research Threads

This framework connects directly to several lines of inquiry I have been pursuing.

Herbert Simon argued in The Architecture of Complexity that systems which evolve under selection pressure develop modular, nearly decomposable structures — because modularity allows local adaptation without global disruption. In the language of Hazen and Wong, this is a structural prediction: that static persistence at the representation level is what enables dynamic persistence and novelty generation at the system level. The implication for agent design is direct — representations without modular structure lack the basis for robust adaptation. They drift rather than evolve.

The question of model organisms for mechanistic interpretability maps directly onto the CAS framework as well. If you perceive the agent as a complex adaptive system, you can design model organisms that isolate specific adaptive mechanisms: memory consolidation, skill acquisition, exploration-exploitation tradeoffs. The CAS lens tells you what to look for and how to design experiments that will reveal it.

Toward a Unified View

What I am proposing is not a framework in the software engineering sense — not a library or an API or an architecture diagram. It is a point of view. A way of seeing agents that is grounded in the deepest patterns we know about how complex systems evolve, adapt, and self-organize.

The practical implication is a design checklist. For any agent system, ask:

What is the static persistence — what stable capabilities does the model bring from training, and how robust are they? What is the dynamic persistence — what mechanisms exist for the agent to actively maintain, accumulate, and refine knowledge through ongoing interaction with its environment? What is the novelty generation capacity — how does the agent explore new configurations when it encounters genuine uncertainty, and what selection pressures favor the discovery of new functions? What is the fitness landscape — what feedback loops determine which behaviors are retained and which are discarded, and are they aligned with the functions we actually want? What is the developmental substrate — what Lego blocks are available for self-organization? And what is the study methodology — how will you characterize, understand, and improve the agent's behavior at each level of analysis?

These questions are problem-agnostic. They apply whether you are building a coding assistant, a scientific research agent, or an autonomous data engineering pipeline. And they are grounded not in the latest framework release, but in what Hazen and Wong argue is a law of nature — one that has governed the evolution of complex systems from atoms to minerals to organisms to, now, artificial intelligence.

As Wong himself noted in an interview: trying to understand the lawful nature of information could help us reckon with what is happening as we build, deploy, and live alongside these systems. I think he is right. And I think the first step is to see agents not as engineered artifacts, but as evolving systems subject to the same universal pressures that have shaped everything else in this universe.

The tools are secondary. The point of view comes first.