NeurIPS T3GDT: Three-Tier Tokens to Guide Decision Transformer for Offline Meta Reinforcement Learning

Poster
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models

T3GDT: Three-Tier Tokens to Guide Decision Transformer for Offline Meta Reinforcement Learning

Zhe Wang · Haozhu Wang · Yanjun Qi

Keywords: [ decision transformer ] [ Offline Meta RL ] [ meta learning ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: Offline meta-reinforcement learning (OMRL) aims to generalize an agent's knowledge from training tasks with offline data to a new unknown RL task with few demonstration trajectories. This paper proposes T3GDT: Three-tier tokens to Guide Decision Transformer for OMRL. First, our approach learns a global token from its demonstrations to summarize a RL task's transition dynamic and reward pattern. This global token specifies the task identity and prepends as the first token for prompting this task's RL roll-out. Second, for each time step $t$, we learn adaptive tokens retrieved from top-relevant experiences in the demonstration. These tokens are fused to improve action prediction at timestep $t$. Third, we replace lookup table-based time embedding with TimetoVec embedding that combines time neighboring relationships into better time representation for RL. Empirically, we compare T3GDT with prompt decision transformer variants and MACAW across five different RL environments from both MuJoCo control and MetaWorld benchmarks.

Chat is not available.

Poster in Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models

T3GDT: Three-Tier Tokens to Guide Decision Transformer for Offline Meta Reinforcement Learning

Zhe Wang · Haozhu Wang · Yanjun Qi

Poster
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models