Poster
From Text to Trajectory: Exploring Unified Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
Pusen Dong · Tianchen Zhu · yue qiu · Haoyi Zhou · Jianxin Li
Safe reinforcement learning (RL) requires agents to finish given tasks while obeying specific constraints. Giving constraints in natural language form has great potential for practical scenarios due to its flexible transfer capability and accessibility. Previous safe RL methods with natural language constraints typically need to design cost functions manually for each constraint, which requires domain expertise and lacks flexibility. Additionally, these methods typically can only provide a constraint on a single state or entity, limiting their universality. To address these issues, we first introduce a universal natural language constraint type, trajectory-level textual constraint, which can model any constraint requirements in real-world scenarios, achieving a broader constraint scope. Further, to replace the standard manual cost functions, we introduce the unified trajectory-level textual constraints translator (U3T) which has two components: (1) text-trajectory alignment component, a multimodal learning framework that connects trajectory with corresponding textual constraints to accurately predict violations and (2) cost assignment component, captures the relevance between state-action pair and textual constraint based on the attention mechanism, and then assigns cost according to its relevance level to address the cost sparsity issue in this scenario. Our empirical results demonstrate that U3T effectively comprehends textual constraint and trajectory, and the policies trained by U3T can achieve a lower violation rate than the standard cost function. Extra studies are conducted to demonstrate that the U3T has zero-shot transfer capability to adapt to constraint-shift environments.
Live content is unavailable. Log in and register to view live content