Poster
in
Workshop: Safe Generative AI
Lexically-constrained automated prompt augmentation: A case study using adversarial T2I data
Jessica Quaye · Alicia Parrish · Oana Inel · Minsuk Kahng · Charvi Rastogi · Hannah Rose Kirk · Jess Tsang · Nathan Clement · Rafael Mosquera-Gomez · Juan Ciro · Vijay Janapa Reddi · Lora Aroyo
Ensuring the safety of images generated by text-to-image (T2I) models is crucial, yet there are limited datasets of adversarial prompts and images available for evaluating model resilience against novel attacks. Existing literature focuses on using either purely human-driven or purely automated techniques to generate adversarial prompts for T2I models. Human-generated data often results in datasets that are small and at times unbalanced. On the other hand, while automated generation can easily scale, the prompts generated often lack diversity and fall short of incorporating human or realistic elements encountered in practice. To address this gap, we combine the strength of both approaches by creating an augmented dataset that leverages two attack strategies identified from the human-written Adversarial Nibbler Dataset. This new dataset consists of realistic and semantically similar prompts, generated in a constrained yet scalable manner. It maintains about 72\% of the failure rate of the human-generated data for inappropriate content, while preserving the realistic nature of the prompts and replicating their ability to cause real-world harms. Our work highlights the importance of human-machine collaboration to leverage human creativity in scalable red-teaming techniques to continuously enhance T2I model safety.