Poster
in
Workshop: Goal-Conditioned Reinforcement Learning
Using Proto-Value Functions for Curriculum Generation in Goal-Conditioned RL
Henrik Metternich · Ahmed Hendway · Pascal Klink · Jan Peters · Carlo DEramo
Keywords: [ Graph Laplacian ] [ Reinforcement Learning ] [ curriculum learning ]
In this paper, we investigate the use of Proto Value Functions (PVFs) for measuring the similarity between tasks in the context of Curriculum Learning (CL). PVFs serve as a mathematical framework for generating basis functions for the state space of a Markov Decision Process (MDP). They capture the structure of the state space manifold and have been shown to be useful for value function approximation in Reinforcement Learning (RL). We show that even a few PVFs allow us to estimate the similarity between tasks. Based on this observation, we introduce a new algorithm called Curriculum Representation Policy Iteration (CRPI) that uses PVFs for CL, and we provide a proof of concept in a Goal-Conditioned Reinforcement Learning (GCRL) setting.