Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

Is ETHICS about ethics? Evaluating the ETHICS benchmark

Leif Hancox-Li · Borhane Blili-Hamelin

Keywords: [ evaluation ] [ dataset ] [ benchmark ] [ ethics ]


Abstract:

The ETHICS benchmark is one of the most-cited benchmarks used for testing how ethical language models are. Here, we offer a preliminary critique of the validity of the ETHICS benchmark. Our findings suggest that having a clear understanding of ethics and how it relates to empirical phenomena is key to creating a valid ethics benchmark.

Chat is not available.