Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Deep Reinforcement Learning

Variance-Seeking Meta-Exploration to Handle Out-of-Distribution Tasks

Yashvir Singh Grewal · Sarah Goodwin


Abstract:

Meta-Reinforcement Learning (meta-RL) yields the potential to improve the sample efficiency of reinforcement learning algorithms. Through training an agent on multiple meta-RL tasks, the agent is able to learn a policy based on past experience, and leverage this to solve new, unseen tasks. Accordingly, meta-RL promises to solve real-world problems, such as real-time heating, ventilation and air-conditioning(HVAC) control without accurate simulators of the target building. In this paper, we propose a meta-RL method which trains an agent on first order models to efficiently learn and adapt to the internal dynamics of a real-world building. We recognise that meta-agents trained on first order simulator models do not perform well on second order models, owing to the meta-RL assumption that the test tasks should be from within the same distribution as the training tasks. In response, we propose a novel exploration method called variance seeking meta-exploration which enables a meta-RL agent to perform well on complex tasks outside of its training distribution.Our method programs the agent to prefer exploring task dependent state-action pairs, and in turn, allows it to adapt efficiently to challenging second order models which bear greater semblance to real-world problems

Chat is not available.