Poster
in
Workshop: Deep Reinforcement Learning Workshop
Constrained Imitation Q-learning with Earth Mover’s Distance reward
WENYAN Yang · Nataliya Strokina · Joni Pajarinen · Joni-kristian Kamarainen
We propose constrained Earth Mover's Distance (CEMD) Imitation Q-learning that combines the exploration power of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL). Sample efficiency makes Imitation Q-learning a suitable approach for robot learning. For Q-learning, immediate rewards can be efficiently computed by a greedy variant of Earth Mover's Distance (EMD) between the observed state-action pairs and state-actions in stored expert demonstrations. In CEMD, we constrain the otherwise non-stationary greedy EMD reward by proposing a greedy EMD upper bound estimate and a generic Q-learning lower bound. In PyBullet continuous control benchmarks, CEMD is more sample efficient, achieves higher performance and yields less variance than its competitors.