NeurIPS Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Poster
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Suhas Hariharan · Zainab Ali Majid · Jaime Raldua Veuthey · Jacob Haimes

Keywords: [ cybersecurity ] [ benchmarks ] [ evaluations ] [ LLMs ]

[ Abstract ]

Abstract:

A key development in the cybersecurity evaluations space is the work carried out by Meta, through their CyberSecEval approach. While this work is undoubtedly a useful contributions to a nascent field, there are notable features that limit their utility. Key drawbacks focus on the insecure code detection part of Meta’s methodology: we explore these limitations, and use our exploration as a test case for LLM-assisted benchmark analysis.

Chat is not available.

Poster in Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Suhas Hariharan · Zainab Ali Majid · Jaime Raldua Veuthey · Jacob Haimes

Poster
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI