NeurIPS AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions

Poster
in
Workshop: Language Gamification

AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions

Aidan McLaughlin · Anuja Uppuluri · James Campbell · Richard Ren

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

AidanBench evaluates large language models (LLMs) on their ability to generate novel ideas in response to open-ended questions, focusing on creativity, reliability, contextual attention, and instruction following. Unlike benchmarks with clear-cut answers, AidanBench assesses models in more open-ended, real-world tasks. Testing several state-of-the-art LLMs, it shows weak correlation with existing benchmarks while offering a more nuanced view of their performance in open-ended scenarios.

Chat is not available.

Poster in Workshop: Language Gamification

AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions

Aidan McLaughlin · Anuja Uppuluri · James Campbell · Richard Ren

Poster
in
Workshop: Language Gamification