NeurIPS Invited talk: Scaling Multimodal Computer Agents

Invited Talk
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)

Invited talk: Scaling Multimodal Computer Agents

Tao Yu

[ Abstract ]

Sun 15 Dec 3:30 p.m. PST — 4 p.m. PST

Abstract:

Recent advances in vision-language models (VLMs) have enabled AI agents to operate computers just as humans do. In this talk, I will present our approach to scaling these agents through three key dimensions: data, methods, and evaluation. First, I will introduce how we leverage internet-scale instructional videos and human demonstrations via our AgentNet platform to build large-scale computer interaction datasets. I will then discuss our methods for training foundation models that ground natural language into interface actions. Finally, I will present Agent Arena, our open platform for scalable real-world evaluation through crowdsourced user computer interactions, and outline key directions for improving agent robustness and safety for real-world deployment.

Chat is not available.

Invited Talk in Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)

Invited talk: Scaling Multimodal Computer Agents

Tao Yu

Invited Talk
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)