Poster
in
Workshop: Goal-Conditioned Reinforcement Learning
Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets
Anirudhan Badrinath · Allen Nie · Yannis Flet-Berliac · Emma Brunskill
Keywords: [ reinforcement learning via supervised learning ] [ offline reinforcement learning ] [ goal-conditioning ] [ behaviour cloning ]
Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) methods and the success of the decision transformer (DT) architecture in various domains, DTs have proven to fall short in challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories, i.e., stitch, leading to poor performance. To overcome these limitations, we present a novel approach to enhance RvS methods by integrating intermediate targets. We introduce the waypoint transformer (WT), using an architecture that builds upon the DT framework and is further conditioned on dynamically-generated waypoints. The results show a significant improvement in the final return compared to existing RvS methods, with performance on par or greater than existing temporal difference learning-based methods. Additionally, the performance and stability is significantly improvedin the most challenging environments and data configurations, including AntMaze Large Play/Diverse and Kitchen Mixed/Partial.