Poster
in
Workshop: Language Gamification
AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions
Aidan McLaughlin · Anuja Uppuluri · James Campbell · Richard Ren
Abstract:
AidanBench evaluates large language models (LLMs) on their ability to generate novel ideas in response to open-ended questions, focusing on creativity, reliability, contextual attention, and instruction following. Unlike benchmarks with clear-cut answers, AidanBench assesses models in more open-ended, real-world tasks. Testing several state-of-the-art LLMs, it shows weak correlation with existing benchmarks while offering a more nuanced view of their performance in open-ended scenarios.
Chat is not available.