Poster
in
Workshop: Workshop on Behavioral Machine Learning
HuLE-Nav: Human-Like Exploration for Zero-Shot Object Navigation via Vision-Language Models
Peilong Han · Min Zhang · Jianye Hao · Hongyao Tang · YAN ZHENG
Enabling robots to navigate as efficiently as humans in unknown environments is an attractive and challenging research goal in the field of embodied intelligence.Following the exploration behaviors of humans, we find that scene semantic understanding, scene spatio-temporal memory, and accumulated knowledge are all key elements to achieve efficient navigation. Inspired by this, we propose a zero-shot object navigation method, HuLE-Nav, which contains two core components: multi-dimensional semantic value maps for human-like exploration memory, human-like exploration processes with multi-dimensional semantic value maps. Specifically, HuLE-Nav first leverages the off-the-shelf Vision-Language Models (VLMs) and real-time observations to dynamically capture the semantic relevance between objects, the scene-level semantics, and spatio-temporal history of exploration paths, and jointly represent them as a multi-dimensional semantic value maps.Then, mimicking the active exploration behavior of humans, we further propose a dynamic exploration and replanning mechanism to flexibly update the long-term goal based on the real-time updated multi-dimensional semantic value maps.Finally, we propose a collision escape strategy based on the powerful reasoning and planning capabilities of VLMs to prevent robots from getting into collisions. The Extensive evaluation of HM3D validates HuLE-Nav outperforms the best-performing competitor +7.3\% success rate and +27.7\% exploration efficiency, respectively.