NeurIPS Reward-Relevance-Filtered Linear Offline Reinforcement Learning

Poster
in
Workshop: Causal Representation Learning

Reward-Relevance-Filtered Linear Offline Reinforcement Learning

Angela Zhou

Keywords: [ offline reinforcement learning ] [ causal feature selection ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: This paper studies causal variable selection in the setting of a Markov decision process, specifically offline reinforcement learning with linear function approximation. The structural restrictions of the data-generating process presume that the transitions factor into sparse dynamics that affect the reward, and additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function is sparse. This is a novel "causal sparsity" notion that does not occur in pure estimation settings. We develop methods for filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso: we use thresholded lasso to recover the support of the rewards, and use this estimated support to estimate the state-action $Q$ function. Such a method has sample complexity depending only on the size of the sparse component. Although this problem differs from the typical statement of "causal representation learning", this notion of "causal sparsity" may be of interest, and our methods connect to a classical statistical literature with theoretical guarantees that can be a stepping stone for more complex representation learning.

Chat is not available.

Poster in Workshop: Causal Representation Learning

Reward-Relevance-Filtered Linear Offline Reinforcement Learning

Angela Zhou

Poster
in
Workshop: Causal Representation Learning