The training pipeline as philosophical injection point
Thesis: Every school intervenes at a specific stage of the modern training pipeline. Reading a training recipe is reading a philosophical argument.
flowchart LR
A[Data curation] --> B[Pre-training]
B --> C[Mid-training / architecture]
C --> D[Post-training]
D --> E[Inference-time]
subgraph Schools
S1[Scaling/connectionist]:::sc
S2[Embodied/world models]:::em
S3[Neuro-symbolic]:::ns
S4[Active inference]:::ai
S5[Evolutionary]:::ev
S6[Causal]:::ca
S7[Symbolic]:::sy
end
S1 --> B
S2 --> B
S2 --> C
S3 --> C
S3 --> D
S4 --> C
S5 --> D
S6 --> A
S6 --> C
S7 --> D
S7 --> E
classDef sc fill:#ffd;
classDef em fill:#dfd;
classDef ns fill:#ddf;
classDef ai fill:#fdf;
classDef ev fill:#fdd;
classDef ca fill:#dff;
classDef sy fill:#eee;
6.1 Pre-training — the scaling stage
Scaling hypothesis injection. Kaplan et al. "Scaling Laws for Neural Language Models" (arXiv 2001.08361, 2020) and Hoffmann et al. "Chinchilla" (arXiv 2203.15556, 2022) formalize the empiricist bet: loss is a power-law function of compute, parameters, and tokens. Sutskever's framing of next-token prediction as lossless compression of the training distribution is Humean induction with a modern compression-theoretic gloss.
World-model injection. LeCun's JEPA (2022) replaces pixel/token generation with prediction in latent embedding space — a Kantian move: the model should predict the structure of the world, not reproduce its surface. V-JEPA 2 (2025) extends this to video at scale. Nvidia Cosmos trains world foundation models on robot and AV video.
Causal injection. Data-curation choices that preserve interventional and counterfactual structure (domain randomization, deliberately diverse environments, causal data collection) are a probabilistic/causal intervention at the data stage.
6.2 Mid-training and architecture
Architecture is where Kant lives. Every inductive bias is a category:
- MoE (Mixtral 8×7B 2023; DeepSeek-V3 2024; Kimi K2 1T; INTELLECT-3 Nov 2025) is specialization-under-routing — weakly modular cognition.
- State-space models — Mamba (Gu & Dao, arXiv 2312.00752, 2023); Mamba-2 — linear-time sequence modeling; philosophically a dynamical-systems rebuttal to transformer hegemony.
- Liquid neural networks (Hasani et al., 2021–2025) — continuous-time ODEs; Whiteheadian process metaphysics in PyTorch.
- Neuro-symbolic modules — Symbolica's categorical deep learning; ExtensityAI's SymbolicAI primitives/contracts; hybrid transformer + verifier (AlphaProof).
- Memory and retrieval — RAG (Lewis et al. 2020), vector memory, 1M–10M-token contexts (Gemini 1.5/2.5) — an external-memory fix to the Humean bundle problem.
- Active-inference modules — hierarchical Bayesian generative components (VERSES, RGMs) inject Friston directly.
6.3 Post-training — the alignment stage
RLHF (Ouyang et al., InstructGPT, arXiv 2203.02155, 2022) injects human preference as reward — Rousseau's general will, sort of.
DPO (Rafailov et al., arXiv 2305.18290, 2023) closed-form reparameterizes away the explicit reward model — widely adopted in open source.
Constitutional AI / RLAIF (Bai et al., arXiv 2212.08073, 2022) replaces human labelers with a critic LLM governed by a written constitution — dialectic baked into training. OpenAI's deliberative alignment (arXiv 2412.16339, December 2024) extends this into reasoning time.
RLVR — Reinforcement Learning from Verifiable Rewards is the defining post-training paradigm of 2025. Rewards come from programmatic verifiers (unit tests, math checkers, Lean proofs, compilers) — non-gameable. DeepSeek-R1's GRPO (no critic, group-relative baselines, rule-based rewards) plus the "reasoning gyms" ecosystem (Prime Intellect Environments Hub, verifiers, INTELLECT-2/3, Reasoning Gym arXiv 2505.24760) constitute the full open-source stack.
Evolutionary post-training. Sakana's model merging is evolution applied after training is ostensibly done — finding populations of specialists rather than one generalist. AlphaEvolve (DeepMind, May 2025) uses an LLM-as-mutator plus evolutionary selection to discover new algorithms — including a new 4×4 matrix-multiplication procedure beating Strassen.
6.4 Inference-time — the new scaling axis
o1 (September 2024), o3 (December 2024), DeepSeek-R1 (January 2025), Gemini Deep Think (2025), Kimi k1.5, Qwen QwQ-32B all exploit a second scaling axis: test-time compute. Snell et al. (arXiv 2408.03314, DeepMind/Berkeley, August 2024) showed compute-optimal allocation of test-time compute can beat a 14× larger model in FLOP-matched evaluation.
Philosophically, inference-time reasoning is the cleanest instantiation of System 2 in current systems: deliberate, serial, token-budgeted, revisable. It also flirts with the neuro-symbolic school when the chain of thought includes tool calls to verifiers (AlphaProof's RL-on-Lean loop, reasoning models with calculator/code tools).
Pipeline-to-philosophy table
| Stage | Intervention | School | Representative work |
|---|---|---|---|
| Data curation | Causal/diverse data | Causal | Schölkopf causal ML |
| Pre-training | Scale tokens+params+compute | Connectionist | Chinchilla (2022) |
| Pre-training | Latent predictive objective | Embodied | V-JEPA 2 (2025) |
| Architecture | Inductive biases | Kantian hybrid | Mamba, MoE, JEPA |
| Architecture | Categorical structure | Symbolic | Symbolica CDL |
| Architecture | Generative Bayesian modules | Active inference | VERSES RGMs |
| Post-training | RLHF / DPO | Dialectic | Ouyang 2022, Rafailov 2023 |
| Post-training | Constitutional AI | Dialectic + rule-based | Bai 2022 |
| Post-training | RLVR / GRPO | Empirical + formal | DeepSeek-R1 (2025) |
| Post-training | Evolutionary merge | Evolutionary | Sakana (2024) |
| Inference-time | Long chain-of-thought | Neuro-symbolic | o1/o3, Deep Think |
| Inference-time | Search + verifier | Symbolic + neural | AlphaProof, AG2 |