HMV-CRL — PraCha

The Problem with Engagement Data

Every recommender system trains on engagement — clicks, watches, replays. The implicit assumption is that engagement signals preference. But it does not. A user who watched three hours of auto-played content is not expressing a preference. They got caught in a loop. The algorithm chose for them, and the data recorded it as a choice.

This is not a noise problem. It is a confounding problem. Platform-driven exposure and genuine user preference are two different causes of the same observed behavior. Standard collaborative filtering cannot tell them apart, so it learns a mixture of both — optimizing for engagement that the platform itself produces, not for what users actually want. Change the algorithm, and the whole learned model shifts unpredictably.

The Idea

There are two surfaces in the data that reveal different things. Search interactions are user-initiated: the user typed something, chose something. That signal is closer to genuine preference. Recommendation interactions are platform-initiated: the system chose what to show, the user may or may not have engaged. That signal is closer to exposure.

HMV-CRL uses both surfaces together to learn two disentangled representations: one capturing what the platform shaped, one capturing what the user actually wanted. The key is that each representation is anchored to a different set of covariates — different auxiliary variables make the latent factors separately identifiable without needing interventional data.

The architecture is two Transformer encoders fusing search and recommendation sequences via Product-of-Experts. Identifiability follows from the iVAE framework: the distinct auxiliary variables per channel provide the statistical variation needed to separate the latent factors. A learned causal graph over the representations then enables proper causal effect estimation — you can intervene on the platform factor while holding preference fixed.

What the Model Learned

On KuaiSAR, a large-scale dual-surface dataset: the platform representation recovered its target factors with MCC = 0.944. The preference representation, despite being harder to identify, reached MCC = 0.713 — and critically, the exposure covariates predicted it with R² = 0.046. Essentially zero. The preference representation did not absorb algorithmic noise. It stayed selective.

Under a simulated policy shift — change how the algorithm recommends — the preference representation barely moved while the platform representation shifted dramatically. This is what identifiability means in practice: genuine preferences should not change when the recommendation algorithm changes. Here, they did not.

The Finding That Changes the Interpretation

Mediation analysis revealed something counterintuitive: most of the platform's indirect effect on engagement routes through the platform factor as a suppressor. The platform amplifies observed engagement while simultaneously suppressing the preference signal. You see more interaction, but less of what the user actually wants. By any naive engagement metric, the system looks like it is working. The causal model reveals it is substituting platform-induced behavior for genuine preference — and calling it success.

This is the kind of finding that requires causal structure to see at all. Without separating the two representations, the suppressor path is invisible.

Why This Matters

Recommendation systems at scale have real effects on what information people encounter, what products they buy, what content shapes their views. If the optimization target is engagement conflated with exposure, the system is not serving users — it is producing engagement for itself. Separating these causes is not just a modeling improvement. It is a precondition for evaluating whether a recommender system is doing what it is supposed to do.

Code and paper on GitHub →