Tutorial
(Track3) Policy Optimization in Reinforcement Learning Q&A
Sham M Kakade · Martha White · Nicolas Le Roux
This tutorial will cover policy gradients methods in reinforcement learning, with a focus on understanding foundational ideas from an optimization perspective. We will discuss the properties of the policy objective, in terms of two critical properties for convergence rates when using stochastic gradient approaches: variance and curvature. We will explain how the policy objective can be a particularly difficult optimization problem, as it can have large flat regions and stochastic samples of the gradient can be very high variance. We will first explain how to use standard tools from optimization to reduce the variance of the gradient estimate, as well as techniques to mitigate curvature issues. We will then discuss optimization improvements that leverage more knowledge about the objective, including the Markov property and how to modify the state distribution for more coverage. We will discuss how standard Actor-Critic methods with (off-policy) data re-use provide RL-specific variance reduction approaches. We will then conclude with an overview of what is known theoretically about the policy objective, where we discuss the role of entropy-regularization and exploration for mitigating curvature issues. The tutorial website is here: Home (google.com)
Timetable: Nicolas - 40 minute presentation + 10 minute Q&A Martha - 40 minute presentation + 10 minute Q&A Sham - 40 minute presentation + 10 minute Q&A
Bio and timetable on the website:(https://sites.google.com/ualberta.ca/rlandoptimization-neurips2020/home)