Skip to yearly menu bar Skip to main content


Spotlight Poster

DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

Weikang Wan · Ziyu Wang · Yufei Wang · Zackory Erickson · David Held

[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: This paper introduces DiffTOP, which utilizes $\textbf{Diff}$erentiable $\textbf{T}$rajectory $\textbf{OP}$timization as the policy representation to generate actions for deep reinforcement and imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTOP addresses the “objective mismatch” issue of prior model-based RL algorithms, as the dynamics model in DiffTOP is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTOP for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feedforward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTOP outperforms prior state-of-the-art methods in both domains.

Live content is unavailable. Log in and register to view live content