Poster
Parallelizing Model-based Reinforcement Learning Over the Sequence Length
Zirui Wang · Yue DENG · Junfeng Long · Yin Zhang
West Ballroom A-D #6506
Recently, Model-based Reinforcement Learning (MBRL) methods have demonstrated stunning sample efficiency in various RL domains.However, achieving this extraordinary sample efficiency comes with additional training costs in terms of computations, memory, and training time.To address these challenges, we propose the Parallelized Model-based Reinforcement Learning (PaMoRL) framework.PaMoRL introduces two novel techniques: the Parallel World Model (PWM) and the Parallelized Eligibility Trace Estimation (PETE) to parallelize both model learning and policy learning stages of current MBRL methods over the sequence length.Our PaMoRL framework is hardware-efficient and stable, and it can be applied to various tasks with discrete or continuous action spaces using a single set of hyperparameters.The empirical results demonstrate that the PWM and PETE within PaMoRL significantly increase training speed without sacrificing inference efficiency.In terms of sample efficiency, PaMoRL maintains an MBRL-level sample efficiency that outperforms other no-look-ahead MBRL methods and model-free RL methods, and it even exceeds the performance of planning-based MBRL methods and methods with larger networks in certain tasks.
Live content is unavailable. Log in and register to view live content