Poster
in
Workshop: Agent Learning in Open-Endedness Workshop
JARVIS-1: Open-Ended Multi-task Agents with Memory-Augmented Multimodal Language Models
Zihao Wang · Shaofei Cai · Anji Liu · Xiaojian (Shawn) Ma · Yitao Liang
Keywords: [ Open-ended Agent ] [ Multi-task Agent ] [ Minecraft ]
We propose a multi-task agent JARVIS-1 designed for the complex environment of Minecraft, marks a significant advancement in achieving human-like planning within an open-world setting. By leveraging pre-trained Vision-Language Models, JARVIS-1 not only effectively interprets multimodal inputs but also adeptly translates them into actions. Its integration of a multimodal memory, which draws from both ingrained knowledge and real-time game experiences, enhances its decision-making capabilities. The empirical evidence of its prowess is evident in its impressive performance across a wide array of tasks in Minecraft. Notably, its achievement in the long-horizon diamond pickaxe task, where it achieved a completion rate that surpasses VPT by up to 5 times, underscores its potential and the strides made in this domain. This breakthrough sets the stage for the future of more versatile and adaptable agents in complex virtual environments.