Skip to content

Reinforcement Learning

What this subject is for: Learning from delayed reward — MDPs, policy gradients, actor-critic, model-based RL, world models, RLHF.

Track status: 8 substantive concept pages. See the live generation status and the latest retrospective.

Concepts

Arcs through this subject

No arcs yet — the retrospective proposes these once concept coverage hits ≥4 pages per track.

Key thinkers

Author pages pending.

Builds tied to this subject

MVB recipes pending — currently they live inside concept pages' Build it sections.


Auto-rebuilt from filesystem state by scripts/rebuild_track_indexes.py — see system architecture.