NeurIPS Poster Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs under Partial Data Coverage

Poster

Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs under Partial Data Coverage

Haobo Zhang · Xiyue Peng · Honghao Wei · Xin Liu

West Ballroom A-D #6208

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Offline safe reinforcement learning (RL) aims to find an optimal policy using a pre-collected dataset when data collection is impractical or risky. We propose a novel linear programming (LP) based primal-dual algorithm for convex MDPs that incorporates ``uncertainty'' parameters to improve data efficiency while requiring only partial data coverage assumption. Our theoretical results achieve a sample complexity of $\mathcal{O}(1/(1-\gamma)\sqrt{n})$ under general function approximation, improving the current state-of-the-art by a factor of $1/(1-\gamma)$, where $n$ is the number of data samples in an offline dataset, and $\gamma$ is the discount factor. The numerical experiments validate our theoretical findings, demonstrating the practical efficacy of our approach in achieving improved safety and learning efficiency in safe offline settings.

Chat is not available.