NeurIPS Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents

Poster
in
Workshop: Towards Safe & Trustworthy Agents

Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents

Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

LLM agents are rapidly improving at benchmarks designed to measure software development and reasoning ability. As the creation and improvement of such systems is itself a software development task, we are interested in this specific subset of ability. We describe the first steps towards a "meta-benchmark", in which a "top-level" agent aims to increase the performance of "reference agents" at tasks taken from a range of other benchmarks in the literature. We show that agents are able to alter other systems to a) better withstand prompt injection, b) fix non-trivial bugs in scaffolds, c) compare the performance of various LLMs at ``operating'' reference agents' scaffolds, and d) make progress towards model-editing tasks such as unlearning. We consider this a first step towards a broad and extensible meta-benchmark, which would allow us to measure AI self-improvement abilities.

Chat is not available.

Poster in Workshop: Towards Safe & Trustworthy Agents

Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents

Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y

Poster
in
Workshop: Towards Safe & Trustworthy Agents