Poster
A Simple Framework for Generalization in Visual RL under Dynamic Scene Perturbations
Wonil Song · Hyesong Choi · Kwanghoon Sohn · Dongbo Min
West Ballroom A-D #6405
In the rapidly evolving domain of vision-based deep reinforcement learning (RL), a pivotal challenge is to achieve generalization capability to dynamic environmental changes reflected in visual observations.Our work delves into the intricacies of this problem, identifying two key issues that appear in previous approaches for visual RL generalization: (i) imbalanced saliency and (ii) observational overfitting.Imbalanced saliency is a phenomenon where an RL agent disproportionately identifies salient features across consecutive frames in a frame stack. Observational overfitting occurs when the agent focuses on certain background regions rather than task-relevant objects.To address these challenges, we present a simple yet effective framework for generalization in visual RL (SimGRL) under dynamic scene perturbations.First, to mitigate the imbalanced saliency problem, we introduce an architectural modification to the image encoder to stack frames at the feature level rather than the image level.Simultaneously, to alleviate the observational overfitting problem, we propose a novel technique called shifted random overlay augmentation, which is specifically designed to learn robust representations capable of effectively handling dynamic visual scenes.Extensive experiments demonstrate the superior generalization capability of SimGRL, achieving state-of-the-art performance in benchmarks including the DeepMind Control Suite.