Skip to yearly menu bar Skip to main content


Poster
in
Workshop: System-2 Reasoning at Scale

CryptoFormalEval: Integrating Large Language Models and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection

Cristian Curaba · D'Ambrosi Denis · Alessandro Minisini


Abstract:

Cryptographic protocols play a fundamental role in securing modern digital infrastructures, but are often deployed without prior formal verification, potentially leading to the widespread adoption of distributed systems vulnerable to unforeseen attack vectors. Formal verification methods, on the other hand, often require the application of complex and time-consuming techniques that lack automatization. In this paper, we introduce a benchmark to assess the ability of Large Language Models (LLMs) to autonomously identify vulnerabilities in new cryptographic protocols through interaction with Tamarin, a powerful theorem prover for protocol verification. We propose a manually validated dataset of novel, flawed, communication protocols and we design a method to automatically verify the vulnerabilities found by the AI agents. We provide some early results about the performances of the current frontier models on the benchmark, leveraging state-of-the-art prompting and scaffolding techniques to boost their capabilities. With this paper, we aim to provide valuable insights about the possibility of innovative cybersecurity applications obtainable by integrating LLMs with symbolic reasoning systems.

Chat is not available.