Poster
in
Workshop: Causal Representation Learning
Learning Endogenous Representation in Reinforcement Learning via Advantage Estimation
Hsiao-Ru Pan · Bernhard Schölkopf
Keywords: [ causal effect ] [ Reinforcement Learning ] [ exomdp ]
Recently, it was shown that the advantage function can be understood as quantifying the causal effect of an action on the cumulative reward. However, this connection remained largely analogical, with unclear implications. In the present work, we examine this analogy using the Exogenous Markov Decision Process (ExoMDP) framework, which factorizes an MDP into variables that are causally related to the agent's actions (endogenous) and variables that are beyond the agent's control (exogenous). We demonstrate that the advantage function can be expressed using only the endogenous variables, which is, in general, not possible for the (action-)value function. Through experiments in a toy ExoMDP, we found that estimating the advantage function directly can facilitate learning representations that are invariant to the exogenous variables.