Poster
in
Workshop: Adaptive Experimental Design and Active Learning in the Real World
Active Learning for Iterative Offline Reinforcement Learning
Lan Zhang · Luigi Franco Tedesco · Pankaj Rajak · Youcef Zemmouri · Hakan Brunzell
Offline Reinforcement Learning (RL) has emerged as a promising approach to addressreal-world challenges where online interactions with the environment are limited, risky,or costly. Although, recent advancements produce high quality policies from offline data,currently, there is no systematic methodology to continue to improve them without resortingto online fine-tuning. This paper proposes to repurpose Offline RL to produce a sequenceof improving policies, namely, Iterative Offline Reinforcement Learning (IORL). To producesuch sequence, IORL has to cope with imbalanced offline datasets and to perform controlledenvironment exploration. Specifically, we introduce ”Return-based Sampling” as meansto selectively prioritize experience from high-return trajectories and active learning driven”Dataset Uncertainty Sampling” to probe state-actions inversely proportional to densityin the dataset.We demonstrate that our proposed approach produces policies that achievemonotonically increasing average returns, from 65.4 to 140.2, in the Atari environment.