Philosophical foundations
Thesis: Modern AI debates recapitulate the rationalist–empiricist split, with Kant standing (unacknowledged) behind every hybrid architecture. The seventeenth- and eighteenth-century arguments are not historical curiosities — they are the load-bearing frames of contemporary ML.
1.1 Descartes: doubt, dualism, and the algorithm
René Descartes (1596–1650) did two things that still matter. First, in the Meditations (1641), he invented the modern method of systematic doubt: strip away everything that can be doubted until you hit bedrock. The bedrock was cogito ergo sum — a thinking thing exists. Second, he split reality into two substances: res cogitans (thinking stuff, non-extended) and res extensa (physical stuff, extended in space). Mind and body are fundamentally different kinds.
Three Cartesian inheritances run through AI. Rationalism — the claim that certain knowledge comes from reason, not experience — is the ancestor of symbolic AI and formal methods: if intelligence is theorem-proving, you want deduction from axioms. Method — the Discourse on Method (1637) is literally an algorithm: break problems into parts, proceed from simple to complex, enumerate exhaustively. Every engineering pipeline echoes this. Dualism is the uncomfortable legacy: the mind-body split makes substrate independence intuitive (minds can be transferred to silicon) but also creates the "hard problem" of consciousness that will haunt Part VII.
Innate ideas are Descartes' most contested contribution. He argued that some concepts — God, self, geometric truths — cannot come from sensation and must be innate. In ML terms, these are priors, inductive biases, architectural structure. Whenever someone argues against tabula rasa — Chomsky against Skinner, Marcus against scaling, LeCun for world models — they are being broadly Cartesian.
1.2 The contenders: Aristotle, Spinoza, Leibniz
Aristotle (384–322 BCE) matters because he refused dualism before it was invented. His hylomorphism — form and matter are inseparable — and his teleological realism (things have intrinsic goals, telos) anticipate modern embodied cognition and organismic views. The Nicomachean Ethics' virtue epistemology (practical wisdom, phronesis, as situated know-how) is the ancestor of Dreyfus's phenomenological critique of symbolic AI.
Baruch Spinoza (1632–1677) proposed monism: there is one substance (Deus sive Natura, God or Nature) with infinite attributes. Mind and body are not two things but two ways of describing one thing. This is the philosophical root of active inference and the free energy principle: Friston's claim that perception, action, and learning are one optimization follows a deeply Spinozist logic — one principle, many surface expressions. Spinoza also gives us a rigorous holism: parts are only intelligible through the whole.
Gottfried Wilhelm Leibniz (1646–1716) is the patron saint of symbolic AI. His dream of a characteristica universalis (a universal symbolic language) and a calculus ratiocinator (a calculus for reasoning) is the direct ancestor of formal logic, Frege, Russell, Gödel, Turing, Lean, and — today — Symbolica, Imandra, and AlphaProof. "When there are disputes among persons, we can simply say: let us calculate." Leibniz also invented monads — individual substances with internal states that mirror the whole — which prefigure agent-based and object-centric representations.
1.3 Empiricism: Locke and Hume
John Locke (1632–1704) wrote the Essay Concerning Human Understanding (1689) and gave us tabula rasa — the mind as blank slate, filled by sensation. Every argument for "just train on enough data" traces back here. The scaling hypothesis is Locke at hyperscale.
David Hume (1711–1776) is the philosopher deep learning engineers should read. Three of his claims are architecturally relevant.
First, the bundle theory of self: there is no unified "I" — only a bundle of perceptions tied together by association. This is distributed representations. An embedding is a Humean bundle; a transformer's internal state is a bundle of attention-weighted perceptions. There is no homunculus reading out meaning — just patterns co-occurring.
Second, the problem of induction: no finite number of observations justifies a universal law. The sun rising a million times does not prove it will rise tomorrow. This is the generalization problem. Every scaling law is a bet that induction works anyway.
Third, causation as constant conjunction: we never observe causal connection, only regular succession. Correlation is all we have access to empirically. This is exactly the limitation Judea Pearl diagnoses in ML: almost all deep learning lives on Pearl's Rung 1 — association — precisely because it is Humean.
1.4 Kant: the synthesizer
Immanuel Kant (1724–1804) is the unacknowledged patriarch of modern ML architecture because he solved — or at least re-framed — the rationalist/empiricist dispute. In the Critique of Pure Reason (1781), he argued that knowledge requires both sensory input (empiricism) and innate categories that structure that input (rationalism). Space, time, and causation are not learned from experience — they are the preconditions of having any experience at all. "Thoughts without content are empty, intuitions without concepts are blind."
In ML terms, Kant is the first neuro-symbolic architect. Raw data (intuitions) + inductive biases (categories) = cognition. Every architectural choice — convolutions encoding translation invariance, transformers encoding permutation equivariance over tokens, graph nets encoding relational structure, JEPA predicting in latent space — is a Kantian category smuggled in.
The 2025 consensus — that pure scaling is necessary but not sufficient, and that world models, causal priors, or formal structure must be added — is Kantianism reborn. Bengio's "Consciousness Prior" (2017) explicitly proposes a sparse-factor prior as an innate architectural bias toward System 2 reasoning. Lake, Ullman, Tenenbaum & Gershman's "Building Machines That Learn and Think Like People" (BBS 2017) is perhaps the most Kantian ML paper ever written.
1.5 The durable axis: reductionism vs. holism
Everything that follows can be read on one axis. Reductionism says intelligence decomposes into simpler parts — atoms, features, tokens, FLOPs. Add enough and you get a mind. Holism says intelligence is irreducibly systemic — it requires embodiment, world, feedback, context. Parts are only intelligible through the whole.
| Reductionist pole | Holist pole |
|---|---|
| Descartes (analysis) | Aristotle (hylomorphism) |
| Locke, Hume (atoms of sensation) | Spinoza (one substance) |
| Logical atomism | Process philosophy |
| GOFAI | Cybernetics |
| Scaling hypothesis | World models, embodiment |
| LLM-as-compression | Active inference |
Every modern AI lab is positioned somewhere on this axis. Keep it in mind.