Poster
in
Workshop: Deep Reinforcement Learning Workshop
Neural All-Pairs Shortest Path for Reinforcement Learning
Cristina Pinneri · Georg Martius · Andreas Krause
Having an informative and dense reward function is an important requirement toefficiently solve goal-reaching tasks. While the natural reward for such tasks is abinary signal indicating success or failure, providing only a binary reward makeslearning very challenging given the sparsity of the feedback. Hence, introducingdense rewards helps to provide smooth gradients. However, these functions arenot readily available, and constructing them is difficult, as it often requires a lot oftime and domain-specific knowledge, and can unintentionally create spurious localminima. We propose a method that learns neural all-pairs shortest paths, used as adistance function to learn a policy for goal-reaching tasks, requiring zero domain-specific knowledge. In particular, our approach includes both a self-supervisedsignal from the temporal distance between state pairs of an episode, and a metric-based regularizer that leverages the triangle inequality for an additional connectivityinformation between state triples. This dynamical distance can be either used as acost function, or reshaped as a reward, and, differently from previous work, is fullyself-supervised, compatible with off-policy learning and robust to local minima.