Talk
in
Workshop: Deep Reinforcement Learning
Invited talk: Ashley Edwards "Learning Offline from Observation"
Ashley Edwards
A common trope in sci-fi is to have a robot that can quickly solve some problem after watching a person, studying a video, or reading a book. While these settings are (currently) fictional, the benefits are real. Agents that can solve tasks by observing others have the potential to greatly reduce the burden of their human teachers, removing some of the need to hand-specify rewards or goals. In this talk, I consider the question of how an agent can not only learn by observing others, but also how it can learn quickly by training offline before taking any steps in the environment. First, I will describe an approach that trains a latent policy directly from state observations, which can then be quickly mapped to real actions in the agent’s environment. Then I will describe how we can train a novel value function, Q(s,s’), to learn off-policy from observations. Unlike previous imitation from observation approaches, this formulation goes beyond simply imitating and rather enables learning from potentially suboptimal observations.