Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI
Is ETHICS about ethics? Evaluating the ETHICS benchmark
Leif Hancox-Li · Borhane Blili-Hamelin
Keywords: [ evaluation ] [ dataset ] [ benchmark ] [ ethics ]
The ETHICS benchmark is one of the most-cited benchmarks used for testing how ethical language models are. Here, we offer a preliminary critique of the validity of the ETHICS benchmark. Our findings suggest that having a clear understanding of ethics and how it relates to empirical phenomena is key to creating a valid ethics benchmark.