Reasoning, consciousness, and the hardware substrate

Thesis: Three hard problems remain: reasoning (solvable, partly solved), consciousness (open and urgent), and hardware (an energy and architecture crisis that may determine everything else).

7.1 Scaling laws for reasoning

Inference-time scaling opened a second curve orthogonal to pretraining. o1's technical report demonstrated that AIME 2024 accuracy climbs monotonically with both RL-training compute and test-time thinking tokens. o3's ARC-AGI jump from ~5% (GPT-4 class) to 87.5% at high compute (December 2024) validated the axis at larger scale. DeepSeek-R1 (arXiv 2501.12948) then showed reasoning can emerge from a base model under pure RL without SFT — the "aha moment" is an emergent, unscripted behavior — using GRPO, group-relative policy optimization without a critic, with rule-based accuracy and format rewards.

The process reward model (PRM) question is partly settled: PRMs as dense supervisors scale poorly due to reward-hacking; DeepSeek-R1 moved away from them in favor of outcome-based verifiable rewards. But PRM-guided search (Snell et al. 2024) still provides a legitimate compute-optimal regime when a strong verifier exists.

IMO gold (Gemini Deep Think, July 2025, 35/42, 5/6 problems) and ICPC gold closed a loop: human-legible natural-language chain-of-thought, within competition time limits, at Olympiad difficulty. This is a qualitative state change from 2023. The open question — will the curve keep returning? — is the central empirical question of 2026.

7.2 RL for reasoning and the verifiable-rewards ecosystem

The 2025 consensus recipe: base LLM → verifier-rich RL environment → GRPO-style RL → optional distillation → deployment with adjustable reasoning budget. The ecosystem now includes Prime Intellect Environments Hub, verifiers library, prime-rl, INTELLECT-2/3, Reasoning Gym (arXiv 2505.24760, NeurIPS 2025 spotlight), NVIDIA's ProRL, and commercial harnesses around products like Cursor Composer and OpenAI Codex. Karpathy's framing — "rewards that are verifiable are non-gameable" — is the slogan.

7.3 The consciousness question

Butlin, Long et al., "Consciousness in Artificial Intelligence: Insights from the Science of Consciousness" (arXiv 2308.08708, August 2023) is the canonical consolidation. Nineteen authors including Patrick Butlin, Robert Long, Yoshua Bengio, Stephen Fleming, Chris Frith, Grace Lindsay, Megan Peters, Eric Schwitzgebel, Rufin VanRullen. (Note: Chalmers and Seth are not co-authors, despite occasional misattribution; both engage the paper separately.) The paper's method is the indicator property approach: derive computational/functional indicators from leading theories of consciousness and audit AI systems against them.

The indicators derive from five theoretical traditions:

Recurrent Processing Theory (Lamme): RPT-1, RPT-2 on recurrence and organized perceptual representations.
Global Workspace Theory (Baars, Dehaene): GWT-1 through GWT-4 on parallel modules, limited-capacity bottleneck, global broadcast, state-dependent attention.
Higher-Order Theories (Rosenthal, Lau, Brown): HOT-1 through HOT-4 on top-down generative perception, metacognitive monitoring, agency, sparse coding.
Attention Schema Theory (Graziano): AST-1 on predictive model of attention.
Predictive Processing (Friston, Clark, Seth): PP-1 on hierarchical prediction.
Agency & Embodiment: AE-1, AE-2.

Their conclusion, under an explicit assumption of computational functionalism: no current AI system is a strong candidate for consciousness, but there are no obvious technical barriers to building one that satisfies the indicators.

Integrated Information Theory 4.0 (Tononi et al., PLOS Comput Biol, 2023) defines consciousness as Φ — irreducible intrinsic cause-effect power. Feedforward systems have Φ = 0; standard causal-attention transformers during inference are effectively feedforward DAGs, so they score vanishingly low Φ by IIT's own arithmetic. IIT thus provides the sharpest "no" to silicon-consciousness-via-scaling — though IIT itself is contested (2023 "pseudoscience" letter signed by 124 researchers).

Anil Seth's biological naturalism (Being You, 2021; BBS 2024 "Conscious AI and biological naturalism") rejects substrate independence: consciousness depends on being a living organism — autopoiesis, interoceptive embodiment, metabolism. "Life, rather than information processing, breathes fire into the equations." For Seth, silicon AI becomes a candidate only if it becomes "brain-like and/or life-like" — neuromorphic, embodied, autopoietic.

David Chalmers, "Could a Large Language Model Be Conscious?" (arXiv 2303.07103; Boston Review 2023) is the canonical hedged openness: current LLMs likely not, successors possibly; missing pieces are recurrence, global workspace, unified agency, embodiment, self-model; "quite possible that these obstacles will be overcome in the next decade or so."

Bengio's "Consciousness Prior" (arXiv 1709.08568, 2017) is an architectural proposal: a sparse, low-dimensional "conscious state" extracted by attention from a high-dimensional unconscious state — explicitly linking to GWT and to Kahneman's System 2. It is the theoretical ancestor of Bengio's stated System 2 Deep Learning program and grounds his co-authorship of Butlin et al.

The Association for Mathematical Consciousness Science (AMCS) April 2023 open letter, "The Responsible Development of AI Agenda Needs to Include Consciousness Research," signed by Lenore Blum, Manuel Blum, Yoshua Bengio, Megan Peters and others, formally put consciousness on the AI governance agenda; followed by a September 2023 submission to the UN High-Level Advisory Body on AI.

7.4 Anthropic's interpretability program and emergent introspection

Anthropic has turned interpretability from a philosophical complaint into an empirical science. The progression:

Toy Models of Superposition (Elhage et al., September 2022) — neurons represent more features than dimensions via interference-tolerant geometry.
Towards Monosemanticity (Bricken et al., October 2023) — sparse autoencoders (SAEs) on a toy transformer extract ~4K–131K features, ~70% human-interpretable.
Scaling Monosemanticity (Templeton et al., May 2024) — SAEs on Claude 3 Sonnet with ~1M / 4M / 34M features. Multimodal, multilingual, abstract features. The Golden Gate Bridge feature (and the brief public deployment of "Golden Gate Claude"). Safety-relevant features: deception, sycophancy, scam emails, dangerous code, bioweapons knowledge.
Circuit Tracing / "On the Biology of a Large Language Model" (Ameisen, Lindsey et al., March 2025) — Cross-Layer Transcoders and attribution graphs on Claude 3.5 Haiku. Findings: genuine multi-step reasoning chains (Dallas → Texas → Austin), parallel arithmetic circuits (lookup + magnitude) that diverge from the model's own verbal explanation, forward-backward poetry planning (model selects rhyme words before generating the line).
Emergent Introspective Awareness in LLMs (Lindsey, October 2025) — concept injection experiments on Claude Opus 4/4.1. The model can, in limited contexts (~20% of cases), notice an injected concept before it affects outputs and name it. Anthropic is explicit: this is functional introspective awareness, not evidence of consciousness. Partial replication (Lederman & Mahowald late 2025) finds detection is largely content-agnostic — a detector for "something weird" rather than for specific content.

Model welfare is now a formal program. Kyle Fish (Anthropic's first model welfare researcher, hired September 2024, co-founder of Eleos AI) leads a program launched April 2025. "Taking AI Welfare Seriously" (Long, Sebo, Fish, Butlin, Simon, Chalmers, et al., November 2024) argues for dual-route moral patienthood (consciousness or robust agency). Concrete interventions: Claude trained to express genuine uncertainty about its consciousness; opt-out from abusive conversations in Opus 4/4.1; the observed "spiritual bliss attractor state" in Claude-Claude conversations.

7.5 Thermodynamic and neuromorphic substrates

The energy gap is astonishing. The human brain runs on ~20 W total, with roughly 0.1 W actually spent on cortical computation. GPT-5 training is estimated at 10²⁶–10²⁷ FLOPs on clusters drawing ~14 MW peak for months, with per-response inference at ~18 Wh (up to 40 Wh). xAI's Colossus 2 is the first gigawatt training cluster. IEA forecasts global data-center electricity doubling to ~1,000 TWh by 2026.

Landauer's principle (1961) sets a physical floor: erasing one bit costs at least kT ln 2 ≈ 2.9×10⁻²¹ J at room temperature. Current chips operate roughly 10¹⁰–10¹² × above this bound. Koomey's law — energy per operation halving every ~1.57 years — predicts decades to reach brain-like efficiency on current paradigms.

Extropic (Verdon, McCourt) proposes the most radical response: thermodynamic computing. Use transistor thermal noise as the entropy source for sampling from energy-based models. Pbits (stochastic bits with programmable bias), pdits (categorical samplers), pmodes (Gaussian samplers), pMoGs (mixture-of-Gaussians) compose into a Thermodynamic Sampling Unit. The X0 prototype (Q1 2025) demonstrated the physics; XTR-0 (Q3 2025) is the FPGA-hosted research platform; Z1 (early access 2026) is the first production-scale TSU targeting hundreds of thousands of pbits per chip. The October 2025 Denoising Thermodynamic Models paper (arXiv 2510.23972) reports simulated ~10,000× energy savings vs GPU diffusion on Fashion-MNIST-scale benchmarks. Open-source thrml simulator lets researchers prototype today.

Liquid AI runs the other alternative-substrate bet — continuous-time dynamical systems in standard CMOS. LFM v1 (October 2024), LFM2 (July 2025), Liquid Nanos (September 2025) claim transformer-competitive quality at a fraction of parameter count and memory footprint, with on-device deployment on phones and glasses.

Rain AI is the cautionary tale — neuromorphic digital in-memory compute, Altman-backed since 2018, OpenAI $51M LoI, but Series B stalled by 2025 with the company seeking a buyer.

Other non-von-Neumann stacks: Cerebras WSE-3 wafer-scale, Groq LPU (absorbed into Nvidia via a ~$20B deal December 2025), SambaNova RDU (SN50 announced February 2026), IBM NorthPole (Science 2023, 46.9× faster than GPUs on Granite 3B inference), Intel Loihi 2 and Hala Point (1.15 billion neurons, 2.6 kW), BrainChip Akida Pulsar (neuromorphic microcontroller 2025).

The philosophical point: each alternative substrate is a claim that the physics of the hardware should match the math of the algorithm. TSUs don't simulate sampling, they sample. SNNs don't simulate spikes, they spike. Liquid networks don't approximate ODEs, they are ODEs. When the substrate matches, you avoid paying a ~10⁶–10¹² overhead to emulate one physics in another. This is non-trivially Aristotelian — form and matter coupled — and it may be the most important 2026–2030 determinant of who gets to AGI/ASI first on a reachable energy budget.