Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Towards Safe & Trustworthy Agents

C-MCTS: Safe Planning with Monte Carlo Tree Search

Dinesh Parthasarathy · Georgios Kontes · Axel Plinge · Christopher Mutschler


Abstract:

The Constrained Markov Decision Process (CMDP) formulation allows to solve safety-critical decision making tasks that are subject to constraints. While CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as Monte Carlo Tree Search (MCTS) for solving them.Previous approaches are conservative with respect to costs as they avoid constraint violations by using Monte Carlo cost estimates that suffer from high variance.We propose Constrained MCTS (C-MCTS), which estimates cost using a safety critic that is trained with Temporal Difference learning in an offline phase prior to agent deployment. The critic limits exploration to unsafe regions during deployment by pruning unsafe trajectories within MCTS. This makes C-MCTS more efficient w.r.t. planning steps. Compared to previous work, it achieves higher rewards by operating closer to the constraint boundary (while satisfying cost constraints) and is less susceptible to cost violations under model mismatch between the planner and the deployment environment.

Chat is not available.