Poster
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)
Planning as Inpainting: A Generative Framework for Realistic Embodied Path Planning
Cheng-Fu Yang · Haoyang Xu · Te-Lin Wu · Xiaofeng Gao · Kai-Wei Chang · Feng Gao
Keywords: [ Multimodal Navigation · Embodied AI · Robotics ]
Embodied Path planning, the process of determining optimal navigation paths or trajectories for robotic operations, is pivotal for the autonomy of robots in the wild. Generative methods have shown great promise in avoiding myopic decision by predicting the entire trajectory simultaneously. However, whether such type of method generalizes to more realistic settings — with high-dimensional state-space and potential partial observability, remains an open question. To address these problems, we propose a generative framework, "planning-as-inpainting", reconceptualizing path planning via utilizing the environmental map as a dynamic canvas to "inpaint" the predicted trajectories. This approach enables effectively leveraging the high-dimensional observations throughout the planning process due to its capability of: (1) precisely capturing the intricate environmental nuances, and (2) preserving the presented spatial relationships and physical constraints. To tackle the prevalent issue of model hallucinating future decisions when planning under partial observability, our framework integrates language conditioning mechanism. This mechanism is utilized to ground and infer the target position within the environment, increasing the accuracy and reliability of the plan. The proposed framework achieves promising performance across various embodied AI tasks, including vision-language navigation, object manipulation, and task planning in a realistic egocentric environment, highlighting its capability of handling the complexities of real-world scenarios.