Poster
in
Workshop: Intrinsically Motivated Open-ended Learning (IMOL) Workshop
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
Adriana Hugessen · Roger Creus Castanyer · Glen Berseth
Keywords: [ intrinsic motivation ] [ Reinforcement Learning ]
Both surprise-minimizing and surprise-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method can perform well across all entropy regimes. In an effort to find a single surprise-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective depending on the entropy conditions it faces, by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit which captures the ability of the agent to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes.