The great debates

Thesis: Five fault lines define the 2026 conversation. Each is a classical philosophical dispute re-enacted in ML terms, and each has become empirically testable in a way the classical versions were not.

4.1 The Bitter Lesson vs. bio-inspired / cognitive science

Sutton's 2019 essay is the charter document of pure scaling. "Researchers have repeatedly tried to build into their systems the human knowledge they think useful… but all of this is a long-term failure." Deep Blue, AlphaGo, AlexNet, GPT — in each case the baked-in knowledge lost to the generic scalable method.

The cognitive-science counter, articulated by Marcus, Tenenbaum, Lake, and Gershman, argues that human intelligence demonstrably uses core priors — object permanence, agents, causality, compositional generalization — and that ignoring these costs vast sample efficiency. Lake et al. (BBS 2017) is the canonical reply.

The 2025–2026 twist: Silver & Sutton's "Welcome to the Era of Experience" (preprint chapter for Designing an Intelligence, MIT Press, April 2025) pushes the Bitter Lesson into a third era. Era 1 (simulation, AlphaGo); Era 2 (human data, GPT); Era 3 (experience — agents generating their own training data via environmental interaction). AlphaProof, DeepSeek-R1, and agentic RL systems are cited as evidence. The counter is immediate: Steven Byrnes and others argue Era 3 without alignment structure is existentially reckless.

4.2 Scaling vs. world models

This is the Altman/Sutton axis vs. the LeCun/Hassabis axis. Altman publicly commits to trillion-dollar infrastructure spend; Lightcap (OpenAI COO) in 2025: "Our scaling laws still hold… there's no reason to believe there's any kind of diminishing return on pre-training." Musk's Colossus 2 (January 2026, Southaven, MS) is the first gigawatt training cluster — scaling as industrial strategy.

LeCun's counter, hardened through 2024–2025: "LLMs are useful, but they are an off-ramp on the road to human-level AI. If you are a PhD student, don't work on LLMs." In November 2025 he left Meta to start a JEPA-focused company. Fei-Fei Li's "From Words to Worlds" (November 2025) extends the argument: language is 1D; the world is at least 4D; spatial intelligence is a different kind of problem.

Hassabis occupies a middle position — scale and scaffold. Gemini 3 + Deep Think + AlphaFold + AlphaEvolve + Genie 3 is hedged across schools.

4.3 LLMs vs. System 2 reasoning

A year ago, critics said LLMs could not reason. In September 2024, OpenAI's o1 opened a new scaling axis: "The performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute)." Then o3 (December 2024) hit 87.5% on ARC-AGI and 25.2% on FrontierMath. DeepSeek-R1 (January 2025, arXiv 2501.12948) showed pure RL from a base model — no SFT, using GRPO and rule-based verifiable rewards — elicits emergent self-reflection ("Wait, let me reconsider" — the now-canonical "aha moment"). Gemini Deep Think took IMO gold in July 2025 end-to-end in natural language, within the 4.5-hour contest limit.

The debate is whether this is genuine System 2 or very expensive interpolation. Marcus remains skeptical; Bengio (consciousness prior) sees architectural progress; Chollet (ARC) updated to say o3 is a "genuine breakthrough" while still maintaining it fails on easy tasks.

4.4 Reductionism vs. holism in modern ML

The mechanistic interpretability program (Anthropic: Olah, Elhage, Batson, Lindsey) is aggressively reductionist — features, circuits, attribution graphs. Scaling Monosemanticity (May 2024) found millions of interpretable features in Claude 3 Sonnet including the infamous Golden Gate Bridge feature. "On the Biology of a Large Language Model" (March 2025) showed genuine multi-step reasoning circuits, parallel arithmetic circuits, and forward-backward poetry planning. "Emergent Introspective Awareness" (Lindsey, October 2025) showed Claude Opus 4/4.1 can sometimes detect injected concepts before they affect outputs.

The holist reply: features are not enough. Seth's biological naturalism, Friston's whole-organism framing, and the embodied-AI school all insist meaning lives in the system's coupling with a world, not in a list of features. The 2026 consensus is that both views capture real things — reductionist interpretability for alignment and debugging; holist framings for understanding intelligence proper.

4.5 Anthropomorphism vs. evolution as the only proof

A quieter but important debate: is the anthropomorphic frame (GWT, HOT, human-style reasoning) itself a distraction? Some researchers argue the only real proof of general intelligence is evolutionary survival under open-ended pressure — a Stanley-Lehman, Clune-style view. On this view, benchmarks, imitation games, and IMO scores are measuring the wrong thing; AGI will only be recognized in retrospect, after systems have survived and adapted across novel environments. This frame underwrites the evolutionary / open-ended school's skepticism of current benchmark culture.