The training pipeline as philosophical injection point

Thesis: Every school intervenes at a specific stage of the modern training pipeline. Reading a training recipe is reading a philosophical argument.

  flowchart LR
      A[Data curation] --> B[Pre-training]
      B --> C[Mid-training / architecture]
      C --> D[Post-training]
      D --> E[Inference-time]
      subgraph Schools
        S1[Scaling/connectionist]:::sc
        S2[Embodied/world models]:::em
        S3[Neuro-symbolic]:::ns
        S4[Active inference]:::ai
        S5[Evolutionary]:::ev
        S6[Causal]:::ca
        S7[Symbolic]:::sy
      end
      S1 --> B
      S2 --> B
      S2 --> C
      S3 --> C
      S3 --> D
      S4 --> C
      S5 --> D
      S6 --> A
      S6 --> C
      S7 --> D
      S7 --> E
      classDef sc fill:#ffd;
      classDef em fill:#dfd;
      classDef ns fill:#ddf;
      classDef ai fill:#fdf;
      classDef ev fill:#fdd;
      classDef ca fill:#dff;
      classDef sy fill:#eee;

6.1 Pre-training — the scaling stage

Scaling hypothesis injection. Kaplan et al. "Scaling Laws for Neural Language Models" (arXiv 2001.08361, 2020) and Hoffmann et al. "Chinchilla" (arXiv 2203.15556, 2022) formalize the empiricist bet: loss is a power-law function of compute, parameters, and tokens. Sutskever's framing of next-token prediction as lossless compression of the training distribution is Humean induction with a modern compression-theoretic gloss.

World-model injection. LeCun's JEPA (2022) replaces pixel/token generation with prediction in latent embedding space — a Kantian move: the model should predict the structure of the world, not reproduce its surface. V-JEPA 2 (2025) extends this to video at scale. Nvidia Cosmos trains world foundation models on robot and AV video.

Causal injection. Data-curation choices that preserve interventional and counterfactual structure (domain randomization, deliberately diverse environments, causal data collection) are a probabilistic/causal intervention at the data stage.

6.2 Mid-training and architecture

Architecture is where Kant lives. Every inductive bias is a category:

MoE (Mixtral 8×7B 2023; DeepSeek-V3 2024; Kimi K2 1T; INTELLECT-3 Nov 2025) is specialization-under-routing — weakly modular cognition.
State-space models — Mamba (Gu & Dao, arXiv 2312.00752, 2023); Mamba-2 — linear-time sequence modeling; philosophically a dynamical-systems rebuttal to transformer hegemony.
Liquid neural networks (Hasani et al., 2021–2025) — continuous-time ODEs; Whiteheadian process metaphysics in PyTorch.
Neuro-symbolic modules — Symbolica's categorical deep learning; ExtensityAI's SymbolicAI primitives/contracts; hybrid transformer + verifier (AlphaProof).
Memory and retrieval — RAG (Lewis et al. 2020), vector memory, 1M–10M-token contexts (Gemini 1.5/2.5) — an external-memory fix to the Humean bundle problem.
Active-inference modules — hierarchical Bayesian generative components (VERSES, RGMs) inject Friston directly.

6.3 Post-training — the alignment stage

RLHF (Ouyang et al., InstructGPT, arXiv 2203.02155, 2022) injects human preference as reward — Rousseau's general will, sort of.

DPO (Rafailov et al., arXiv 2305.18290, 2023) closed-form reparameterizes away the explicit reward model — widely adopted in open source.

Constitutional AI / RLAIF (Bai et al., arXiv 2212.08073, 2022) replaces human labelers with a critic LLM governed by a written constitution — dialectic baked into training. OpenAI's deliberative alignment (arXiv 2412.16339, December 2024) extends this into reasoning time.

RLVR — Reinforcement Learning from Verifiable Rewards is the defining post-training paradigm of 2025. Rewards come from programmatic verifiers (unit tests, math checkers, Lean proofs, compilers) — non-gameable. DeepSeek-R1's GRPO (no critic, group-relative baselines, rule-based rewards) plus the "reasoning gyms" ecosystem (Prime Intellect Environments Hub, verifiers, INTELLECT-2/3, Reasoning Gym arXiv 2505.24760) constitute the full open-source stack.

Evolutionary post-training. Sakana's model merging is evolution applied after training is ostensibly done — finding populations of specialists rather than one generalist. AlphaEvolve (DeepMind, May 2025) uses an LLM-as-mutator plus evolutionary selection to discover new algorithms — including a new 4×4 matrix-multiplication procedure beating Strassen.

6.4 Inference-time — the new scaling axis

o1 (September 2024), o3 (December 2024), DeepSeek-R1 (January 2025), Gemini Deep Think (2025), Kimi k1.5, Qwen QwQ-32B all exploit a second scaling axis: test-time compute. Snell et al. (arXiv 2408.03314, DeepMind/Berkeley, August 2024) showed compute-optimal allocation of test-time compute can beat a 14× larger model in FLOP-matched evaluation.

Philosophically, inference-time reasoning is the cleanest instantiation of System 2 in current systems: deliberate, serial, token-budgeted, revisable. It also flirts with the neuro-symbolic school when the chain of thought includes tool calls to verifiers (AlphaProof's RL-on-Lean loop, reasoning models with calculator/code tools).

Pipeline-to-philosophy table

Stage	Intervention	School	Representative work
Data curation	Causal/diverse data	Causal	Schölkopf causal ML
Pre-training	Scale tokens+params+compute	Connectionist	Chinchilla (2022)
Pre-training	Latent predictive objective	Embodied	V-JEPA 2 (2025)
Architecture	Inductive biases	Kantian hybrid	Mamba, MoE, JEPA
Architecture	Categorical structure	Symbolic	Symbolica CDL
Architecture	Generative Bayesian modules	Active inference	VERSES RGMs
Post-training	RLHF / DPO	Dialectic	Ouyang 2022, Rafailov 2023
Post-training	Constitutional AI	Dialectic + rule-based	Bai 2022
Post-training	RLVR / GRPO	Empirical + formal	DeepSeek-R1 (2025)
Post-training	Evolutionary merge	Evolutionary	Sakana (2024)
Inference-time	Long chain-of-thought	Neuro-symbolic	o1/o3, Deep Think
Inference-time	Search + verifier	Symbolic + neural	AlphaProof, AG2