Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)

Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents

Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y

Keywords: [ Alignment ] [ LLM ] [ LLM Agents ] [ Benchmarking ] [ AI governance ] [ NLP ] [ AI safety ] [ Self-improvement ] [ AI evaluations ]


Abstract:

LLM agents are rapidly improving at benchmarks designed to measure software development and reasoning ability. As the creation and improvement of such systems is itself a software development task, we are interested in this specific subset of ability. We describe the first steps towards a "meta-benchmark", in which a "top-level" agent aims to increase the performance of "reference agents" at tasks taken from a range of other benchmarks in the literature. We show that agents are able to alter other systems to a) better withstand prompt injection, b) fix non-trivial bugs in scaffolds, c) compare the performance of various LLMs at ``operating'' reference agents' scaffolds, and d) make progress towards model-editing tasks such as unlearning. We consider this a first step towards a broad and extensible meta-benchmark, which would allow us to measure AI self-improvement abilities.

Chat is not available.