Skip to content

Arcs

An arc is a diagonal learning path: from a tool you already touch, through a broader frame, to a synthesised capability, landing at the intersection of two active research areas. Each arc names a specific frontier destination you build toward. The MVB at each step is the recipe; the arc is the journey.

8 live · 9 designed and next-up · 13 waiting on missing concept pages · 30 total

Status meanings: 🟢 live = readable on the site now. 🟡 designed · next = the arc is designed in the roadmap and all 5 concept pages it needs exist; the autonomous loop will write it next time it runs. 🟠 waiting on missing concept pages = the arc is designed but one or more of its concept pages need to be written first; those concept pages get auto-seeded by the retrospective, then the arc unlocks.


AI

agentic-rlvr-reasoner

🟢 live — read now · track: AI

Destination — A small LLM (1–7B) fine-tuned with RLVR on verifiable math/code rewards, evaluated on a held-out reasoning benchmark — measured to recover the R1-Zero-style behaviour at ≥60% of the published gain

chain-of-thoughtin-context-learningreward-modelingrlhfmixture-of-experts

mechanistic-interpretability-with-saes

🟠 designed · waiting on missing concept pages · track: AI

Destination — A working sparse-autoencoder trained on a small open-weight LLM's MLP activations, extracting features that match a published circuit (induction heads / IOI / greater-than) on the same layer

mechanistic-interpretabilitysparse-autoencodersfeature-circuitsattentionmixture-of-experts

alignment-via-cot-monitoring

🟠 designed · waiting on missing concept pages · track: AI

Destination — A working CoT monitor that flags deceptive reasoning in a frontier model's outputs, calibrated to OpenAI's published rate on the chain-of-thought-monitoring benchmark

chain-of-thoughtalignment-safetymechanistic-interpretabilitycot-monitoringreward-modeling


Generative Modeling

generative-stack

🟢 live — read now · track: Generative Modeling

Destination — Five trained generative models (DDPM, score-based, latent diffusion, flow-matching, consistency distillation) compared head-to-head on the same dataset with reported FID, sample diversity, and inference latency

diffusion-modelsscore-matchinglatent-diffusion-modelsflow-matchingconsistency-models

world-models-from-video

🟠 designed · waiting on missing concept pages · track: Generative Modeling

Destination — A small video diffusion model conditioned on actions, trained on a driving or game dataset, producing 16-frame rollouts that respect physical conservation on a held-out test set

diffusion-modelsflow-matchingvideo-generationworld-modelsphysical-consistency

controllable-and-distilled-generation

🟠 designed · waiting on missing concept pages · track: Generative Modeling

Destination — A finetuned latent diffusion model with ControlNet-style conditioning on edge maps, distilled to single-step sampling at <100 ms per image on an A10 GPU

latent-diffusion-modelsconsistency-modelscontrolnetlora-finetuningvariational-autoencoders


Representation Learning

self-supervised-vision-foundations

🟢 live — read now · track: Representation Learning

Destination — A vision encoder trained without labels on a 100k-image dataset (SimCLR + MAE hybrid) that transfers to ImageNet linear-probe ≥75% top-1

simclrcontrastive-learningdata-augmentationmasked-autoencodersrepresentation-learning

world-model-representations

🟠 designed · waiting on missing concept pages · track: Representation Learning

Destination — A JEPA-style world model on a video sequence, where the latent representations support a planner reaching a goal state with measured success rate vs a reactive baseline

jepamasked-autoencodersworld-modelsmodel-based-reinforcement-learninglatent-dynamics

multimodal-encoders

🟠 designed · waiting on missing concept pages · track: Representation Learning

Destination — A CLIP-class encoder trained on a 1M image-text dataset, evaluated by zero-shot transfer on three downstream tasks (CIFAR-10, food-101, custom domain)

contrastive-learningsimclrclip-architecturevision-language-pretrainingrepresentation-learning


Neural Networks & Deep Learning

training-fundamentals

🟢 live — read now · track: Neural Networks & Deep Learning

Destination — A from-scratch CNN trained on CIFAR-10 to ≥85% test accuracy with documented loss curves, normalization choices, and a learned schedule — written up like a small lab notebook

backpropagationgradient-descentadaptive-optimizersregularizationbatch-normalization

scaling-and-emergence

🟡 designed · next to be written · track: Neural Networks & Deep Learning

Destination — A small empirical scaling study across 3 model sizes (10M, 100M, 300M params) on a fixed token budget, fitting Chinchilla-style scaling laws and reporting where emergence appears on a target task

scaling-lawsoptimizationemergencedouble-descentmixture-of-experts

efficient-large-model-training

🟡 designed · next to be written · track: Neural Networks & Deep Learning

Destination — A 1B-parameter model trained across 8 GPUs with mixed-precision, ZeRO-style sharding, and gradient bucketing — reporting throughput in tokens/sec/GPU and convergence curves vs the single-GPU baseline

gradient-descentadaptive-optimizersmixed-precision-trainingdata-parallelismtensor-parallelism


Statistical & Probabilistic ML

bayesian-deep-learning

🟢 live — read now · track: Statistical & Probabilistic ML

Destination — A Bayesian neural network on a real production-like dataset, reporting calibration error and showing the uncertainty is actionable (predictive entropy correlates with held-out errors)

bayesian-inferencevariational-inferencebayesian-neural-networksuncertainty-quantificationgaussian-processes

probabilistic-programming-end-to-end

🟢 live — read now · track: Statistical & Probabilistic ML

Destination — A hierarchical Bayesian model fit in Pyro / NumPyro on a real dataset, with full posterior inference (MCMC or VI), posterior predictive checks, and a credible interval that beats a frequentist baseline

probabilistic-programmingbayesian-inferencevariational-inferencemcmcgaussian-processes

causal-bayesian-inference

🟡 designed · next to be written · track: Statistical & Probabilistic ML

Destination — A Bayesian instrumental-variables model fit on an observational dataset, recovering a causal effect with credible interval, compared against a naive regression baseline

bayesian-inferencevariational-inferenceinstrumental-variablescausal-discoverycounterfactuals


Reinforcement Learning

rl-for-post-training

🟡 designed · next to be written · track: Reinforcement Learning

Destination — A 7B open-weight LLM finetuned via GRPO with verifiable rewards on a math/code task, measured to recover ≥60% of DeepSeek R1's published gain on AIME 2024 starting from the same base

policy-gradientppoactor-criticreward-modelingrlhf

world-models-and-imagination

🟢 live — read now · track: Reinforcement Learning

Destination — A Dreamer-class agent that learns a latent world model from environment rollouts and plans in imagination, beating a model-free baseline by ≥30% sample efficiency on a control task

mdppolicy-gradientmodel-based-reinforcement-learningworld-modelsq-learning

agentic-rl-with-tools

🟠 designed · waiting on missing concept pages · track: Reinforcement Learning

Destination — A small LLM agent that calls a web-search + a calculator tool, learns from execution outcomes via GRPO, measured on a multi-step QA benchmark vs the no-RL baseline

pporeward-modelingrlhftool-usemulti-turn-rl


Attention, Memory, Reasoning, Continual

long-context-attention

🟠 designed · waiting on missing concept pages · track: Attention, Memory, Reasoning, Continual

Destination — A small Transformer trained with FlashAttention 2 + ring attention reaching 128K context with sub-quadratic memory, evaluated on a long-context retrieval task (needle-in-haystack ≥95%)

attentionmulti-head-attentionlong-contextflash-attentionring-attention

retrieval-and-memory

🟠 designed · waiting on missing concept pages · track: Attention, Memory, Reasoning, Continual

Destination — A RAG pipeline with hybrid retrieval (BM25 + dense embeddings) feeding a long-context LLM, evaluated on a domain-specific QA benchmark — measured against pure-long-context and pure-retrieval baselines

retrieval-augmented-generationin-context-learninglong-contextvector-searchembedding-models

reasoning-and-test-time-compute

🟠 designed · waiting on missing concept pages · track: Attention, Memory, Reasoning, Continual

Destination — A small reasoning model fine-tuned with CoT + self-verification, evaluated on MATH-500 — comparing test-time compute regimes (greedy / beam / majority vote / R1-style self-verify)

chain-of-thoughtin-context-learningattentionself-verificationtest-time-compute


Causal & Statistical Inference

causal-deep-learning

🟢 live — read now · track: Causal & Statistical Inference

Destination — A neural net trained to predict counterfactual outcomes under a hidden confounder, recovering the true treatment effect within 10% on a semi-synthetic dataset

structural-causal-modelsdo-calculuscounterfactualscausal-representation-learningpotential-outcomes

causal-rl

🟠 designed · waiting on missing concept pages · track: Causal & Statistical Inference

Destination — An RL agent in a structural causal environment that learns to intervene (not just observe) — measured by causal effect recovery vs an observational baseline

counterfactualsdo-calculuspolicy-gradientworld-modelscausal-intervention

causal-discovery-in-practice

🟡 designed · next to be written · track: Causal & Statistical Inference

Destination — Run NOTEARS / PC algorithm on a real observational dataset (e.g. gene expression), recover a causal graph, validate against held-out interventions, report identifiability limits

causal-discoverystructural-causal-modelsinstrumental-variablesmediation-analysispotential-outcomes


Algorithms & Systems for AI

serve-an-llm-efficiently

🟡 designed · next to be written · track: Algorithms & Systems for AI

Destination — A quantized 7B model served behind an endpoint with measured p95 latency under 100 ms, with kv-cache + FlashAttention + INT8 quantization wired and benchmarked at batch sizes 1, 4, 16

flash-attentionkv-cachekv-cache-managementquantizationllm-inference

train-at-scale

🟡 designed · next to be written · track: Algorithms & Systems for AI

Destination — Distributed training of a 1B-parameter model across 8 GPUs with mixed precision, ZeRO-3 sharding, and pipeline parallelism — reporting tokens/sec/GPU vs the single-GPU baseline and identifying the dominant cost (compute vs communication)

distributed-trainingdata-parallelismtensor-parallelismpipeline-parallelismmixed-precision-training

compiler-and-kernel-fusion

🟠 designed · waiting on missing concept pages · track: Algorithms & Systems for AI

Destination — Take a transformer block, profile it, identify the bottleneck, write a fused kernel (Triton or CUDA) for it, and measure the speedup against the PyTorch eager baseline

flash-attentioncompiler-optimizations-for-mlautomatic-differentiationtriton-kernelstensor-cores


Complexity, Cognition & Natural Intelligence

scaling-laws-empirical

🟡 designed · next to be written · track: Complexity, Cognition & Natural Intelligence

Destination — Empirically fit Chinchilla-style scaling laws on a 3-point series of model sizes (10M / 100M / 300M) on a fixed dataset, reporting the compute-optimal ratio you recover vs the published one

scaling-lawsemergencedouble-descentgeneralizationscaling-collapse

emergence-and-double-descent

🟡 designed · next to be written · track: Complexity, Cognition & Natural Intelligence

Destination — An empirical demo of double descent on a small classifier (vary model width across the bias-variance frontier), AND show one emergence-style task where capability appears suddenly with scale

double-descentgeneralizationscaling-lawsemergencecompositionality

compositionality-and-generalization

🟠 designed · waiting on missing concept pages · track: Complexity, Cognition & Natural Intelligence

Destination — Train a small Transformer on a compositional task (SCAN, COGS, or CFQ), measure systematic generalization to held-out compositions — vs a recurrent baseline

compositionalitygeneralizationsystematic-generalizationattentionin-context-learning


This page is auto-rebuilt from docs/system/arc-roadmap.md by scripts/build_arc_catalog.py. Refreshed each cycle.