Poster
in
Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad"
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
Sunil Madhow · Dan Qiao · Yu-Xiang Wang
Offline RL is an important step towards making data-hungry RL algorithms more widely usable in the real world, but conventional assumptions on the distribution of logging data do not apply in some key real-world scenarios. In particular, it is unrealistic to assume that RL practitioners will have access to sets of trajectories that simultaneously are mutually independent and explore well. We propose two natural ways to relax these assumptions: by allowing the data to be distributed according to different logging policies independently, and by allowing logging policies to depend on past trajectories. We discuss Offline Policy Evaluation (OPE) in these settings, analyzing the performance of a model-based OPE estimator when the MDP is tabular.