Poster
in
Workshop: Safe Generative AI
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System
Julian Collado · Kevin Stangl
Recent approaches in machine learning often solve a task using a composition of multiple models or agentic architectures.When targeting a composed system with adversarial attacks, it might be computationally or informationally feasible to train a proxy model for every component of the system, while an end-to-end black-box attack may require too many queries or introduce too much adversarial noise. We introduce a method to craft an adversarial attack against the overall multi-model system when we only have a proxy model for the final black-box model, and when the transformation applied by the initial models can destroy the adversarial perturbations.Current methodshandle this by applying many of copies of the first models/transformations to an input and then re-use a standard adversarial attacks by averaging gradients, or learning a proxy model for both stages. To our knowledge, we are the first attack that is specifically designed for this threat model and our method has a substantially higher attack success rate (80\% vs 25\%) and contains 9.4\% smaller perturbations compared to prior SOTA methods. While our experiments focus on a supervised image pipeline, we believe our attack will generalize to other multi-model settings [e.g. a mix of open/closed source foundation models], or agentic system.