NeurIPS Ensemble-based Offline Reinforcement Learning with Adaptive Behavior Cloning

Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Ensemble-based Offline Reinforcement Learning with Adaptive Behavior Cloning

Danyang Wang · Lingsong Zhang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

In this work, we build upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, and propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based actor critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL MuJoCo benchmarks.

Chat is not available.

Poster in Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Ensemble-based Offline Reinforcement Learning with Adaptive Behavior Cloning

Danyang Wang · Lingsong Zhang

Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning