NeurIPS Multimodal Auto Validation For Self-Refinement in Web Agents

Poster
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)

Multimodal Auto Validation For Self-Refinement in Web Agents

Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan

Keywords: [ Multi-modal Large Language Models ] [ Iterative Refinement ] [ Automatic Evaluation ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

As our world digitizes, web agents that can automate complex and monotonous tasks are becoming essential in streamlining workflows. This paper introduces an approach to improving web agent performance through multi-modal validation and self-refinement. We present a comprehensive study of different modalities (text, vision) and the effect of hierarchy for the automatic validation of web agents, building upon the state-of-the-art Agent-E web automation framework. We also introduce a self-refinement mechanism for web automation, using the developed auto-validator, that enables web agents to detect and self-correct workflow failures. Our results show significant gains on Agent-E's (a SOTA web agent) prior state-of-art performance, boosting task-completion rates from 76.2% to 81.24% on the subset of the WebVoyager benchmark. The approach presented in this paper paves the way for more reliable digital assistants in complex, real-world scenarios.

Chat is not available.

Poster in Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)

Multimodal Auto Validation For Self-Refinement in Web Agents

Ruhana Azam · Tamer Abuelsaad · Aditya Vempaty · Ashish Jagmohan

Poster
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)