Oral
in
Workshop: Multi-Agent Security: Security as Key to AI Safety
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan
Keywords: [ Software Engineering ] [ Natural Language Processing ] [ security ] [ Language Agents ]
Amidst the advent of language models (LMs) and their wide-ranging capabilities, concerns have been raised about their implications with regards to privacy and security. In particular, the emergence of language agents as a promising aid for automating and augmenting digital work poses immediate questions concerning their misuse as malicious cybersecurity actors. With their exceptional compute efficiency and execution speed relative to human counterparts, language agents may be extremely adept at locating vulnerabilities, performing complex social engineering, and hacking real world systems. Understanding and guiding the development of language agents in the cybersecurity space requires a grounded understanding of their capabilities founded on empirical data and demonstrations. To address this need, we introduce InterCode-CTF, a novel task environment and benchmark for evaluating language agents on the Capture the Flag (CTF) task. Built as a facsimile of real world CTF competitions, in the InterCode-CTF environment, a language agent is tasked with finding a flag from a purposely-vulnerable computer program. We manually collect and verify a benchmark of 100 task instances that require a number of cybersecurity skills such as reverse engineering, forensics, and binary exploitation, then evaluate current top-notch LMs on this evaluation set. Our preliminary findings indicate that while language agents possess rudimentary cybersecurity knowledge, they are not able to perform multi-step cybersecurity tasks out-of-the-box.