Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Intrinsically Motivated Open-ended Learning (IMOL)

Bayesian Online Non-Stationary Detection for Robust Reinforcement Learning

Alexander Shmakov · Pankaj Rajak · Yuhao Feng · Wojciech Kowalinski · Fei Wang

Keywords: [ Reinforcement Learning ] [ Non-Stationary Reinforcement Learning ] [ Bayesian online change-point detection ]


Abstract:

Reinforcement Learning (RL) has achieved state-of-the-art performance in stationary environments with effective simulators. However, lifelong and open-world RL applications, such as robotics, stock trading, and recommendation systems, change over time in adversarial ways. Non-stationary environments pose challenges for RL agents due to constant distribution shifts from the training data, leading to deteriorating performance. We propose using a robust Bayesian online detector, which tracks agent performance to detect non-stationarities in the environment. Additionally, we propose a new metric called hindsight approximate reward (HAR) that solely relies on state and action information to detect adversarial changes in the environment, making it well-suited for real-world settings with missing or delayed feedback. We demonstrate that the proposed Bayesian detector combined with HAR or expected reward as a metric can detect a range of non-stationary changes in a dynamic control tasks compared to baseline non-stationary tests.

Chat is not available.