Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Language Gamification

AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions

Aidan McLaughlin · Anuja Uppuluri · James Campbell · Richard Ren


Abstract:

AidanBench evaluates large language models (LLMs) on their ability to generate novel ideas in response to open-ended questions, focusing on creativity, reliability, contextual attention, and instruction following. Unlike benchmarks with clear-cut answers, AidanBench assesses models in more open-ended, real-world tasks. Testing several state-of-the-art LLMs, it shows weak correlation with existing benchmarks while offering a more nuanced view of their performance in open-ended scenarios.

Chat is not available.