Poster
in
Workshop: Agent Learning in Open-Endedness Workshop
Procedural generation of meta-reinforcement learning tasks
Thomas Miconi
Keywords: [ Meta-RL ] [ Meta-Learning ] [ Procedural Generation ] [ POMDP ]
Open-endedness stands to benefit from the ability to generate an infinite variety of diverse, challenging environments. One particularly interesting type of challenge is meta-learning (``learning-to-learn''), a hallmark of intelligent behavior. However, the number of meta-learning environments in the literature is limited. Here we describe a parametrized space for simple meta-reinforcement learning (meta-RL) tasks with arbitrary stimuli. The parametrization allows us to randomly generate an arbitrary number of novel simple meta-learning tasks. The parametrization is expressive enough to include many well-known meta-RL tasks, such as bandit tasks, the Harlow task, T-mazes, the Daw two-step task and others. Simple extensions allow it to capture tasks based on two-dimensional topological spaces, such as find-the-spot or key-door tasks. We describe a number of randomly generated meta-RL tasks and discuss potential issues arising from random generation.