Invited Talk
in
Workshop: Red Teaming GenAI: What Can We Learn from Adversaries?
Invited talk 4: Jonas Geiping on When do adversarial attacks against language models matter?
Jonas Geiping
Adversarial attacks can be found to attack large language model applications, such as conversational chatbots. These attacks can be seen as an attempt to formalize redteaming setups. This is used to jailbreak popular models, that is to circumvent the post-training modifications made to the model to increase the safety of its answers. However, just jailbreaking is practically not a threat with current-generation language models. Yet, as soon as these models are used for any task that goes beyond chatting and simulating text, practical security problems arise that deserve careful attention. In this talk I want to take a look at the last year of jailbreaking research, provide a few (potentially divisive) definitions and try to map out one perspective of what is happening in this field.