Poster
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models
A$^2$Nav: Action-Aware Zero-Shot Robot Navigation Using Vision-Language Ability of Foundation Models
Peihao Chen · Xinyu Sun · Hongyan Zhi · Runhao Zeng · Thomas Li · Mingkui Tan · Chuang Gan
Keywords: [ Vision-and-Language Navigation ] [ foundation models ] [ Zero-Shot Learning ]
Abstract:
We tackle the challenging task of zero-shot vision-and-language navigation (ZS-VLN), where an agent learns to follow complex path instructions without annotated data. We introduce A$^2$Nav, an action-aware ZS-VLN method leveraging foundation models like GPT and CLIP. Our approach includes an instruction parser and an action-aware navigation policy. The parser breaks down complex instructions into action-aware sub-tasks, which are executed using the learned action-specific navigation policy. Extensive experiments show A$^2$Nav achieves promising ZS-VLN performance and even surpasses some supervised learning methods on R2R-Habitat and RxR-Habitat datasets.
Chat is not available.