Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Red Teaming GenAI: What Can We Learn from Adversaries?

Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries

Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine

Keywords: [ Vision Language Model ] [ Multimodal LLM ] [ Jailbreaks ] [ AI Safety ] [ Adversarial Attack ] [ Red Teaming ]


Abstract:

Large Language Models have been extensively studied for their vulnerabilities, particularly in the context of adversarial attacks. However, the emergence of Vision Language Models introduces new modalities of risk that have not yet been thoroughly explored, especially when processing multiple images simultaneously. In this paper, we introduce two black-box jailbreak methods that leverage multi-image inputs to uncover vulnerabilities in these models. We present a new safety evaluation dataset for multimodal LLMs called MultiBench, which is composed of these jailbreak methods. These methods can easily be applied and evaluated using our toolkit. We test these methods against six safety aligned frontier models from Google, OpenAI, and Anthropic, revealing significant safety vulnerabilities. Our findings suggest that even the most powerful language models remain vulnerable against compositional adversarial attacks, specifically those composed of multiple images.

Chat is not available.