NeurIPS 2020 : Towards Safe Policy Improvement for Non-Stationary MDPs



Towards Safe Policy Improvement for Non-Stationary MDPs

Yash Chandak, Scott Jordan, Georgios Theocharous, Martha White, Philip Thomas

Spotlight presentation: Orals & Spotlights Track 20: Social/Adversarial Learning
on 2020-12-09T08:00:00-08:00 - 2020-12-09T08:10:00-08:00

Poster Session 4 (more posters)
on 2020-12-09T09:00:00-08:00 - 2020-12-09T11:00:00-08:00
GatherTown: Social Aspects of Machine Learning ( Town B0 - Spot B1 )

Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.

Toggle Abstract Paper (in Proceedings / .pdf)

Abstract: Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy’s forecasted performance, and confidence intervals are obtained using wild bootstrap.

Towards Safe Policy Improvement for Non-Stationary MDPs

Yash Chandak, Scott Jordan, Georgios Theocharous, Martha White, Philip Thomas

Preview Video and Chat

Chat is not available.