Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning
Ensemble-based Offline Reinforcement Learning with Adaptive Behavior Cloning
Danyang Wang · Lingsong Zhang
In this work, we build upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, and propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based actor critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL MuJoCo benchmarks.