NeurIPS GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations

Oral
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations

Yilin Zhang · Frank J. Kanayet

Keywords: [ reliability ] [ framework ] [ difficulty ] [ representativity ] [ accuracy ] [ Generative AI ] [ diversity ] [ evaluation ] [ efficiency ] [ robustness ]

[ Abstract ]

Abstract:

We introduce a general framework to assess and improve the maturity of GenAI evaluations, across two Areas: Prompts and Labels, each with multiple dimensions. The GEMF assessment provides a report card with maturity levels across each prompt- and label- dimension, a comprehensive summary on the status of the GenAI evaluation, and suggested directions on where to improve.

Chat is not available.

Oral in Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations

Yilin Zhang · Frank J. Kanayet

Oral
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI