Poster
in
Workshop: Offline Reinforcement Learning
Offline Reinforcement Learning with Munchausen Regularization
Hsin-Yu Liu · Bharathan Balaji · Dezhi Hong
Most temporal differences based (TD-based) Reinforcement Learning (RL) methods focus on replacing the true value of a transiting state by their current estimate of this value. Munchausen-RL (M-RL) proposes the idea of incorporating the current policy to be leveraged to bootstrap RL. The concept of penalizing two consecutive policies that are far from each other is also applicable to offline settings. In our work, we add the Munchausen term in the Q-update step to penalize policies that deviate from previous policy too far. Our results indicate that this method could be implemented in various offline Q-learning methods to help improve the performance. In addition, we evaluate how prioritized experience replay affects offline RL. Our results show that Munchausen Offline RL outperforms the original methods that are without the regularization term.