Arcs¶

An arc is a diagonal learning path: from a tool you already touch, through a broader frame, to a synthesised capability, landing at the intersection of two active research areas. Each arc names a specific frontier destination you build toward. The MVB at each step is the recipe; the arc is the journey.

8 live · 9 designed and next-up · 13 waiting on missing concept pages · 30 total

Status meanings: 🟢 live = readable on the site now. 🟡 designed · next = the arc is designed in the roadmap and all 5 concept pages it needs exist; the autonomous loop will write it next time it runs. 🟠 waiting on missing concept pages = the arc is designed but one or more of its concept pages need to be written first; those concept pages get auto-seeded by the retrospective, then the arc unlocks.

AI¶

agentic-rlvr-reasoner ¶

🟢 live — read now · track: AI

Destination — A small LLM (1–7B) fine-tuned with RLVR on verifiable math/code rewards, evaluated on a held-out reasoning benchmark — measured to recover the R1-Zero-style behaviour at ≥60% of the published gain

chain-of-thought → in-context-learning → reward-modeling → rlhf → mixture-of-experts

`mechanistic-interpretability-with-saes`¶

🟠 designed · waiting on missing concept pages · track: AI

Destination — A working sparse-autoencoder trained on a small open-weight LLM's MLP activations, extracting features that match a published circuit (induction heads / IOI / greater-than) on the same layer

mechanistic-interpretability → sparse-autoencoders → feature-circuits → attention → mixture-of-experts

`alignment-via-cot-monitoring`¶

🟠 designed · waiting on missing concept pages · track: AI

Destination — A working CoT monitor that flags deceptive reasoning in a frontier model's outputs, calibrated to OpenAI's published rate on the chain-of-thought-monitoring benchmark

chain-of-thought → alignment-safety → mechanistic-interpretability → cot-monitoring → reward-modeling

Generative Modeling¶

generative-stack ¶

🟢 live — read now · track: Generative Modeling

Destination — Five trained generative models (DDPM, score-based, latent diffusion, flow-matching, consistency distillation) compared head-to-head on the same dataset with reported FID, sample diversity, and inference latency

diffusion-models → score-matching → latent-diffusion-models → flow-matching → consistency-models

`world-models-from-video`¶

🟠 designed · waiting on missing concept pages · track: Generative Modeling

Destination — A small video diffusion model conditioned on actions, trained on a driving or game dataset, producing 16-frame rollouts that respect physical conservation on a held-out test set

diffusion-models → flow-matching → video-generation → world-models → physical-consistency

`controllable-and-distilled-generation`¶

🟠 designed · waiting on missing concept pages · track: Generative Modeling

Destination — A finetuned latent diffusion model with ControlNet-style conditioning on edge maps, distilled to single-step sampling at <100 ms per image on an A10 GPU

latent-diffusion-models → consistency-models → controlnet → lora-finetuning → variational-autoencoders

Representation Learning¶

self-supervised-vision-foundations ¶

🟢 live — read now · track: Representation Learning

Destination — A vision encoder trained without labels on a 100k-image dataset (SimCLR + MAE hybrid) that transfers to ImageNet linear-probe ≥75% top-1

simclr → contrastive-learning → data-augmentation → masked-autoencoders → representation-learning

`world-model-representations`¶

🟠 designed · waiting on missing concept pages · track: Representation Learning

Destination — A JEPA-style world model on a video sequence, where the latent representations support a planner reaching a goal state with measured success rate vs a reactive baseline

jepa → masked-autoencoders → world-models → model-based-reinforcement-learning → latent-dynamics

`multimodal-encoders`¶

🟠 designed · waiting on missing concept pages · track: Representation Learning

Destination — A CLIP-class encoder trained on a 1M image-text dataset, evaluated by zero-shot transfer on three downstream tasks (CIFAR-10, food-101, custom domain)

contrastive-learning → simclr → clip-architecture → vision-language-pretraining → representation-learning

Neural Networks & Deep Learning¶

training-fundamentals ¶

🟢 live — read now · track: Neural Networks & Deep Learning

Destination — A from-scratch CNN trained on CIFAR-10 to ≥85% test accuracy with documented loss curves, normalization choices, and a learned schedule — written up like a small lab notebook

backpropagation → gradient-descent → adaptive-optimizers → regularization → batch-normalization

`scaling-and-emergence`¶

🟡 designed · next to be written · track: Neural Networks & Deep Learning

Destination — A small empirical scaling study across 3 model sizes (10M, 100M, 300M params) on a fixed token budget, fitting Chinchilla-style scaling laws and reporting where emergence appears on a target task

scaling-laws → optimization → emergence → double-descent → mixture-of-experts

`efficient-large-model-training`¶

🟡 designed · next to be written · track: Neural Networks & Deep Learning

Destination — A 1B-parameter model trained across 8 GPUs with mixed-precision, ZeRO-style sharding, and gradient bucketing — reporting throughput in tokens/sec/GPU and convergence curves vs the single-GPU baseline

gradient-descent → adaptive-optimizers → mixed-precision-training → data-parallelism → tensor-parallelism

Statistical & Probabilistic ML¶

bayesian-deep-learning ¶

🟢 live — read now · track: Statistical & Probabilistic ML

Destination — A Bayesian neural network on a real production-like dataset, reporting calibration error and showing the uncertainty is actionable (predictive entropy correlates with held-out errors)

bayesian-inference → variational-inference → bayesian-neural-networks → uncertainty-quantification → gaussian-processes

probabilistic-programming-end-to-end ¶

🟢 live — read now · track: Statistical & Probabilistic ML

Destination — A hierarchical Bayesian model fit in Pyro / NumPyro on a real dataset, with full posterior inference (MCMC or VI), posterior predictive checks, and a credible interval that beats a frequentist baseline

probabilistic-programming → bayesian-inference → variational-inference → mcmc → gaussian-processes

`causal-bayesian-inference`¶

🟡 designed · next to be written · track: Statistical & Probabilistic ML

Destination — A Bayesian instrumental-variables model fit on an observational dataset, recovering a causal effect with credible interval, compared against a naive regression baseline

bayesian-inference → variational-inference → instrumental-variables → causal-discovery → counterfactuals

Reinforcement Learning¶

`rl-for-post-training`¶

🟡 designed · next to be written · track: Reinforcement Learning

Destination — A 7B open-weight LLM finetuned via GRPO with verifiable rewards on a math/code task, measured to recover ≥60% of DeepSeek R1's published gain on AIME 2024 starting from the same base

policy-gradient → ppo → actor-critic → reward-modeling → rlhf

world-models-and-imagination ¶

🟢 live — read now · track: Reinforcement Learning

Destination — A Dreamer-class agent that learns a latent world model from environment rollouts and plans in imagination, beating a model-free baseline by ≥30% sample efficiency on a control task

mdp → policy-gradient → model-based-reinforcement-learning → world-models → q-learning

`agentic-rl-with-tools`¶

🟠 designed · waiting on missing concept pages · track: Reinforcement Learning

Destination — A small LLM agent that calls a web-search + a calculator tool, learns from execution outcomes via GRPO, measured on a multi-step QA benchmark vs the no-RL baseline

ppo → reward-modeling → rlhf → tool-use → multi-turn-rl

Attention, Memory, Reasoning, Continual¶

`long-context-attention`¶

🟠 designed · waiting on missing concept pages · track: Attention, Memory, Reasoning, Continual

Destination — A small Transformer trained with FlashAttention 2 + ring attention reaching 128K context with sub-quadratic memory, evaluated on a long-context retrieval task (needle-in-haystack ≥95%)

attention → multi-head-attention → long-context → flash-attention → ring-attention

`retrieval-and-memory`¶

🟠 designed · waiting on missing concept pages · track: Attention, Memory, Reasoning, Continual

Destination — A RAG pipeline with hybrid retrieval (BM25 + dense embeddings) feeding a long-context LLM, evaluated on a domain-specific QA benchmark — measured against pure-long-context and pure-retrieval baselines

retrieval-augmented-generation → in-context-learning → long-context → vector-search → embedding-models

`reasoning-and-test-time-compute`¶

🟠 designed · waiting on missing concept pages · track: Attention, Memory, Reasoning, Continual

Destination — A small reasoning model fine-tuned with CoT + self-verification, evaluated on MATH-500 — comparing test-time compute regimes (greedy / beam / majority vote / R1-style self-verify)

chain-of-thought → in-context-learning → attention → self-verification → test-time-compute

Causal & Statistical Inference¶

causal-deep-learning ¶

🟢 live — read now · track: Causal & Statistical Inference

Destination — A neural net trained to predict counterfactual outcomes under a hidden confounder, recovering the true treatment effect within 10% on a semi-synthetic dataset

structural-causal-models → do-calculus → counterfactuals → causal-representation-learning → potential-outcomes

`causal-rl`¶

🟠 designed · waiting on missing concept pages · track: Causal & Statistical Inference

Destination — An RL agent in a structural causal environment that learns to intervene (not just observe) — measured by causal effect recovery vs an observational baseline

counterfactuals → do-calculus → policy-gradient → world-models → causal-intervention

`causal-discovery-in-practice`¶

🟡 designed · next to be written · track: Causal & Statistical Inference

Destination — Run NOTEARS / PC algorithm on a real observational dataset (e.g. gene expression), recover a causal graph, validate against held-out interventions, report identifiability limits

causal-discovery → structural-causal-models → instrumental-variables → mediation-analysis → potential-outcomes

Algorithms & Systems for AI¶

`serve-an-llm-efficiently`¶

🟡 designed · next to be written · track: Algorithms & Systems for AI

Destination — A quantized 7B model served behind an endpoint with measured p95 latency under 100 ms, with kv-cache + FlashAttention + INT8 quantization wired and benchmarked at batch sizes 1, 4, 16

flash-attention → kv-cache → kv-cache-management → quantization → llm-inference

`train-at-scale`¶

🟡 designed · next to be written · track: Algorithms & Systems for AI

Destination — Distributed training of a 1B-parameter model across 8 GPUs with mixed precision, ZeRO-3 sharding, and pipeline parallelism — reporting tokens/sec/GPU vs the single-GPU baseline and identifying the dominant cost (compute vs communication)

distributed-training → data-parallelism → tensor-parallelism → pipeline-parallelism → mixed-precision-training

`compiler-and-kernel-fusion`¶

🟠 designed · waiting on missing concept pages · track: Algorithms & Systems for AI

Destination — Take a transformer block, profile it, identify the bottleneck, write a fused kernel (Triton or CUDA) for it, and measure the speedup against the PyTorch eager baseline

flash-attention → compiler-optimizations-for-ml → automatic-differentiation → triton-kernels → tensor-cores

Complexity, Cognition & Natural Intelligence¶

`scaling-laws-empirical`¶

🟡 designed · next to be written · track: Complexity, Cognition & Natural Intelligence

Destination — Empirically fit Chinchilla-style scaling laws on a 3-point series of model sizes (10M / 100M / 300M) on a fixed dataset, reporting the compute-optimal ratio you recover vs the published one

scaling-laws → emergence → double-descent → generalization → scaling-collapse

`emergence-and-double-descent`¶

🟡 designed · next to be written · track: Complexity, Cognition & Natural Intelligence

Destination — An empirical demo of double descent on a small classifier (vary model width across the bias-variance frontier), AND show one emergence-style task where capability appears suddenly with scale

double-descent → generalization → scaling-laws → emergence → compositionality

`compositionality-and-generalization`¶

🟠 designed · waiting on missing concept pages · track: Complexity, Cognition & Natural Intelligence

Destination — Train a small Transformer on a compositional task (SCAN, COGS, or CFQ), measure systematic generalization to held-out compositions — vs a recurrent baseline

compositionality → generalization → systematic-generalization → attention → in-context-learning

This page is auto-rebuilt from docs/system/arc-roadmap.md by scripts/build_arc_catalog.py. Refreshed each cycle.

Arcs¶

AI¶

mechanistic-interpretability-with-saes¶

alignment-via-cot-monitoring¶

Generative Modeling¶

world-models-from-video¶

controllable-and-distilled-generation¶

Representation Learning¶

world-model-representations¶

multimodal-encoders¶

Neural Networks & Deep Learning¶

scaling-and-emergence¶

efficient-large-model-training¶

Statistical & Probabilistic ML¶

causal-bayesian-inference¶

Reinforcement Learning¶

rl-for-post-training¶

agentic-rl-with-tools¶

Attention, Memory, Reasoning, Continual¶

long-context-attention¶

retrieval-and-memory¶

reasoning-and-test-time-compute¶

Causal & Statistical Inference¶

causal-rl¶

causal-discovery-in-practice¶

Algorithms & Systems for AI¶

serve-an-llm-efficiently¶

train-at-scale¶

compiler-and-kernel-fusion¶

Complexity, Cognition & Natural Intelligence¶

scaling-laws-empirical¶

emergence-and-double-descent¶

compositionality-and-generalization¶

`mechanistic-interpretability-with-saes`¶

`alignment-via-cot-monitoring`¶

`world-models-from-video`¶

`controllable-and-distilled-generation`¶

`world-model-representations`¶

`multimodal-encoders`¶

`scaling-and-emergence`¶

`efficient-large-model-training`¶

`causal-bayesian-inference`¶

`rl-for-post-training`¶

`agentic-rl-with-tools`¶

`long-context-attention`¶

`retrieval-and-memory`¶

`reasoning-and-test-time-compute`¶

`causal-rl`¶

`causal-discovery-in-practice`¶

`serve-an-llm-efficiently`¶

`train-at-scale`¶

`compiler-and-kernel-fusion`¶

`scaling-laws-empirical`¶

`emergence-and-double-descent`¶

`compositionality-and-generalization`¶