Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 5th Workshop on Self-Supervised Learning: Theory and Practice

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Khimya Khetarpal · Daniel (Zhaohan) Guo · Bernardo Avila Pires · Yunhao Tang · Clare Lyle · Mark Rowland · Nicolas Heess · Diana Borsa · Arthur Guez · Will Dabney


Abstract: Learning a good representation is a crucial challenge for reinforcement learning (RL) agents. Self-predictive algorithms jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model in the case of a fixed policy (BYOL-$\Pi$); this assumption is at odds with practical implementations, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework. Interestingly, we uncover that BYOL-$\Pi$ and BYOL-AC are related through the lens of variance. We unify the study of these objectives through two complementary lenses; a model-based perspective, where each objective is related to low-rank approximation of certain dynamics, and a model-free perspective, which relates the objectives to modified value, Q-value, and Advantage functions. This mismatch with the true value functions leads to the empirical observation (in both linear and deep RL experiments) that BYOL-$\Pi$ and BYOL-AC are either very similar in performance across many tasks or task-dependent.

Chat is not available.