Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

Dimensions of Generative AI Evaluation Design

Alex Dow · Jennifer Wortman Vaughan · Solon Barocas · Chad Atalla · Alexandra Chouldechova · Hanna Wallach

Keywords: [ generative AI ] [ AI ] [ risks ] [ capabilities ] [ evaluation design ] [ evaluation ]


Abstract:

Evaluating the capabilities and risks of generative AI (GenAI) models and systems is crucial for their successful development, deployment, and adoption, but there are few well-understood principles or guidelines for ensuring effective evaluations. To address this, we propose a set of general dimensions that capture critical choices involved in GenAI evaluation design. These dimensions include the evaluation setting, object of the evaluation, task type, input source, interaction style, duration, metric type, and scoring method. By situating evaluations within these dimensions, we aim to guide decision-making during GenAI evaluation design and provide a structure for comparing different evaluations. We illustrate the utility of these dimensions through examples, including evaluations of fairness and biological threats. Our proposal encourages a more methodical and explicit approach to GenAI evaluation design.

Chat is not available.