Thesis: Every school intervenes at a specific stage of the modern training pipeline. Reading a training recipe is reading a philosophical argument.

  flowchart LR
      A[Data curation] --> B[Pre-training]
      B --> C[Mid-training / architecture]
      C --> D[Post-training]
      D --> E[Inference-time]
      subgraph Schools
        S1[Scaling/connectionist]:::sc
        S2[Embodied/world models]:::em
        S3[Neuro-symbolic]:::ns
        S4[Active inference]:::ai
        S5[Evolutionary]:::ev
        S6[Causal]:::ca
        S7[Symbolic]:::sy
      end
      S1 --> B
      S2 --> B
      S2 --> C
      S3 --> C
      S3 --> D
      S4 --> C
      S5 --> D
      S6 --> A
      S6 --> C
      S7 --> D
      S7 --> E
      classDef sc fill:#ffd;
      classDef em fill:#dfd;
      classDef ns fill:#ddf;
      classDef ai fill:#fdf;
      classDef ev fill:#fdd;
      classDef ca fill:#dff;
      classDef sy fill:#eee;
  

6.1 Pre-training — the scaling stage

Scaling hypothesis injection. Kaplan et al. "Scaling Laws for Neural Language Models" (arXiv 2001.08361, 2020) and Hoffmann et al. "Chinchilla" (arXiv 2203.15556, 2022) formalize the empiricist bet: loss is a power-law function of compute, parameters, and tokens. Sutskever's framing of next-token prediction as lossless compression of the training distribution is Humean induction with a modern compression-theoretic gloss.

World-model injection. LeCun's JEPA (2022) replaces pixel/token generation with prediction in latent embedding space — a Kantian move: the model should predict the structure of the world, not reproduce its surface. V-JEPA 2 (2025) extends this to video at scale. Nvidia Cosmos trains world foundation models on robot and AV video.

Causal injection. Data-curation choices that preserve interventional and counterfactual structure (domain randomization, deliberately diverse environments, causal data collection) are a probabilistic/causal intervention at the data stage.

6.2 Mid-training and architecture

Architecture is where Kant lives. Every inductive bias is a category:

  • MoE (Mixtral 8×7B 2023; DeepSeek-V3 2024; Kimi K2 1T; INTELLECT-3 Nov 2025) is specialization-under-routing — weakly modular cognition.
  • State-space models — Mamba (Gu & Dao, arXiv 2312.00752, 2023); Mamba-2 — linear-time sequence modeling; philosophically a dynamical-systems rebuttal to transformer hegemony.
  • Liquid neural networks (Hasani et al., 2021–2025) — continuous-time ODEs; Whiteheadian process metaphysics in PyTorch.
  • Neuro-symbolic modules — Symbolica's categorical deep learning; ExtensityAI's SymbolicAI primitives/contracts; hybrid transformer + verifier (AlphaProof).
  • Memory and retrieval — RAG (Lewis et al. 2020), vector memory, 1M–10M-token contexts (Gemini 1.5/2.5) — an external-memory fix to the Humean bundle problem.
  • Active-inference modules — hierarchical Bayesian generative components (VERSES, RGMs) inject Friston directly.

6.3 Post-training — the alignment stage

RLHF (Ouyang et al., InstructGPT, arXiv 2203.02155, 2022) injects human preference as reward — Rousseau's general will, sort of.

DPO (Rafailov et al., arXiv 2305.18290, 2023) closed-form reparameterizes away the explicit reward model — widely adopted in open source.

Constitutional AI / RLAIF (Bai et al., arXiv 2212.08073, 2022) replaces human labelers with a critic LLM governed by a written constitution — dialectic baked into training. OpenAI's deliberative alignment (arXiv 2412.16339, December 2024) extends this into reasoning time.

RLVR — Reinforcement Learning from Verifiable Rewards is the defining post-training paradigm of 2025. Rewards come from programmatic verifiers (unit tests, math checkers, Lean proofs, compilers) — non-gameable. DeepSeek-R1's GRPO (no critic, group-relative baselines, rule-based rewards) plus the "reasoning gyms" ecosystem (Prime Intellect Environments Hub, verifiers, INTELLECT-2/3, Reasoning Gym arXiv 2505.24760) constitute the full open-source stack.

Evolutionary post-training. Sakana's model merging is evolution applied after training is ostensibly done — finding populations of specialists rather than one generalist. AlphaEvolve (DeepMind, May 2025) uses an LLM-as-mutator plus evolutionary selection to discover new algorithms — including a new 4×4 matrix-multiplication procedure beating Strassen.

6.4 Inference-time — the new scaling axis

o1 (September 2024), o3 (December 2024), DeepSeek-R1 (January 2025), Gemini Deep Think (2025), Kimi k1.5, Qwen QwQ-32B all exploit a second scaling axis: test-time compute. Snell et al. (arXiv 2408.03314, DeepMind/Berkeley, August 2024) showed compute-optimal allocation of test-time compute can beat a 14× larger model in FLOP-matched evaluation.

Philosophically, inference-time reasoning is the cleanest instantiation of System 2 in current systems: deliberate, serial, token-budgeted, revisable. It also flirts with the neuro-symbolic school when the chain of thought includes tool calls to verifiers (AlphaProof's RL-on-Lean loop, reasoning models with calculator/code tools).

Pipeline-to-philosophy table

StageInterventionSchoolRepresentative work
Data curationCausal/diverse dataCausalSchölkopf causal ML
Pre-trainingScale tokens+params+computeConnectionistChinchilla (2022)
Pre-trainingLatent predictive objectiveEmbodiedV-JEPA 2 (2025)
ArchitectureInductive biasesKantian hybridMamba, MoE, JEPA
ArchitectureCategorical structureSymbolicSymbolica CDL
ArchitectureGenerative Bayesian modulesActive inferenceVERSES RGMs
Post-trainingRLHF / DPODialecticOuyang 2022, Rafailov 2023
Post-trainingConstitutional AIDialectic + rule-basedBai 2022
Post-trainingRLVR / GRPOEmpirical + formalDeepSeek-R1 (2025)
Post-trainingEvolutionary mergeEvolutionarySakana (2024)
Inference-timeLong chain-of-thoughtNeuro-symbolico1/o3, Deep Think
Inference-timeSearch + verifierSymbolic + neuralAlphaProof, AG2