NeurIPS GPAI Evaluations Standards Taskforce: towards effective AI governance

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)

GPAI Evaluations Standards Taskforce: towards effective AI governance

Patricia Paskov · Lukas Berglund · Everett Smith · Lisa Soder

Keywords: [ large language models ] [ EU ] [ risk mitigation ] [ foundation models ] [ AI governance ] [ international standards ] [ AI safety ] [ risk assessment ] [ AI policy ] [ model evaluations ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

General-purpose AI (GPAI) evaluations have been proposed as a promising way of identifying and mitigating systemic risks posed by AI development and deployment. While GPAI evaluations play an increasingly central role in institutional decision- and policy-making – including by way of the European Union (EU) AI Act’s mandate to conduct evaluations on GPAI models presenting systemic risk – no standards exist to date to promote their quality or legitimacy. To strengthen GPAI evaluations in the EU and beyond, we outline three desiderata for GPAI evaluations: robustness, results reproducibility, and interoperability. To uphold these desiderata in a dynamic environment of continuously evolving risks, we propose a dedicated EU GPAI Evaluation Standards Taskforce, to be housed within the bodies established by the EU AI Act. We outline the responsibilities of the Taskforce, specify the GPAI provider commitments that would facilitate Taskforce success, discuss the potential impact of the Taskforce on global AI governance, and address potential sources of failure to which policymakers should heed.

Chat is not available.

Poster in Workshop: Socially Responsible Language Modelling Research (SoLaR)

GPAI Evaluations Standards Taskforce: towards effective AI governance

Patricia Paskov · Lukas Berglund · Everett Smith · Lisa Soder

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)