Causal models provide the right abstraction for robust, transferable representations — the ICM principle bridges causality and representation learning.
Causal models provide the right abstraction for robust, transferable representations — the ICM principle bridges causality and representation learning.
This page is meant to read like a research note with provenance. The structured links around it show how the source is being transformed into captures, atoms, and public artifacts.
Reading notes
Causal models provide the right abstraction for robust, transferable representations — the ICM principle bridges causality and representation learning.
Representations that capture true causal structure remain stable under distribution shift, unlike purely statistical features that exploit spurious correlations.
“Causal models can be seen as the correct abstraction level for generalizing across domains.”
The ICM principle states causal generative mechanisms are autonomous modules — changing one does not affect others. Representations aligned with these mechanisms inherit their invariance.
Each mechanism in a causal system operates independently — changing one mechanism does not alter the others.
“The mechanisms of the causal generative model are autonomous and do not inform or influence each other.”
Nature's generative process factorizes into independent modules corresponding to edges in the causal graph. This is a structural assumption about how the world generates data.
Learning statistically independent latent factors does not guarantee that factors correspond to true causal variables or support interventional reasoning.
“Without further assumptions, unsupervised disentanglement is fundamentally impossible.”
Disentanglement methods optimize for statistical independence, but independent components can be rotated arbitrarily without changing the likelihood. Only causal structure breaks this symmetry.
Observing data across multiple environments provides the contrastive signal to identify causal vs. spurious features — causal features stay stable, spurious ones shift.
“Distribution shifts correspond to local interventions on the causal model, providing a natural supervision signal.”
Single-environment data is ambiguous — both causal and spurious features predict equally well. Multiple environments break this symmetry because only invariant features persist.