Oral
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI
GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations
Yilin Zhang · Frank J. Kanayet
Keywords: [ reliability ] [ framework ] [ difficulty ] [ representativity ] [ accuracy ] [ Generative AI ] [ diversity ] [ evaluation ] [ efficiency ] [ robustness ]
We introduce a general framework to assess and improve the maturity of GenAI evaluations, across two Areas: Prompts and Labels, each with multiple dimensions. The GEMF assessment provides a report card with maturity levels across each prompt- and label- dimension, a comprehensive summary on the status of the GenAI evaluation, and suggested directions on where to improve.