Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Language Gamification

Beyond Benchmarking: Automated Capability Discovery via Model Self-Exploration

Cong Lu · Shengran Hu · Jeff Clune


Abstract:

Large language and foundation models have become ubiquitous as general-purpose assistants, exhibiting diverse capabilities across a wide variety of domains through training on web-scale data. Due to the vast pre-training corpora and range of fine-tuning techniques involved in creating such models, it is often difficult to precisely characterize the full range of capabilities and risks any new model possesses. Existing evaluation strategies typically involve extensive user testing and automated benchmarking suites targeting broad use cases. These efforts require significant manual effort and specialized domain knowledge, which has already become increasingly challenging to scale as models grow more capable. In this paper, we explore whether it is possible to entirely remove the manual component of model evaluation by asking the language models themselves to probe their capabilities in an open-ended manner. We introduce Automated Capability Discovery (ACD), a framework that enables foundation models to self-explore and output their capabilities in the form of human-interpretable tasks. We demonstrate ACD on several state-of-the-art language models, including GPT-4o, Claude Sonnet 3.5, and Llama3.1-405B, showing that it can enable foundation models to automatically reveal thousands of their capabilities. ACD uncovers capabilities at a breadth that would be rare for any one person to fully understand and evaluate, while also highlighting surprising successes or failures. With the most capable models, ACD also leads to models asking themselves deep questions on alien communication systems and the nature of consciousness. ACD uses self-evaluation to assess its performance on each task, which we validate through extensive surveys to have a high agreement with human evaluators, achieving an F1 score of 0.9 in the case of GPT-4o. ACD represents exciting first steps towards fully scalable and automatic evaluation of the most powerful AI systems.

Chat is not available.