Reinforcement Learning¶
What this subject is for: Learning from delayed reward — MDPs, policy gradients, actor-critic, model-based RL, world models, RLHF.
Track status: 8 substantive concept pages. See the live generation status and the latest retrospective.
Concepts¶
- Actor-Critic
- Markov Decision Processes
- Model-Based Reinforcement Learning
- Policy gradient
- Policy Gradients
- Proximal Policy Optimization
- Q-learning
- World models
Arcs through this subject¶
No arcs yet — the retrospective proposes these once concept coverage hits ≥4 pages per track.
Key thinkers¶
Author pages pending.
Builds tied to this subject¶
MVB recipes pending — currently they live inside concept pages' Build it sections.
Auto-rebuilt from filesystem state by scripts/rebuild_track_indexes.py — see system architecture.