Reinforcement Learning¶

What this subject is for: Learning from delayed reward — MDPs, policy gradients, actor-critic, model-based RL, world models, RLHF.

Track status: 8 substantive concept pages. See the live generation status and the latest retrospective.

Concepts¶

Author pages pending.

MVB recipes pending — currently they live inside concept pages' Build it sections.

Auto-rebuilt from filesystem state by scripts/rebuild_track_indexes.py — see system architecture.