NeurIPS Decoding Backdoors in LLMs and Their Implications

Invited Talk
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly

Decoding Backdoors in LLMs and Their Implications

Bo Li

[ Abstract ]

Abstract:

In the rapidly evolving landscape of artificial intelligence, generative AI has emerged as a powerful and transformative technology with significant potential across various applications, such as medical, financial, and autonomous driving. However, with this immense potential comes the imperative to ensure the safety and trustworthiness of generative models before their large-scale deployment.

In particular, as large language models (LLMs) become increasingly prevalent in real-world applications, understanding and mitigating the risks associated with potential backdoors is paramount. This talk will delve into the critical examination of backdoors embedded in LLMs and explore their potential implications on the security and reliability of these models in different applications. Specifically, I will first talk about different strategies for injecting backdoors in LLMs and a series of CoT frameworks. I will then discuss potential defenses against known and unknown backdoors in LLM. I will provide an overview of how to assess, improve, and certify the resilience of LLMs against potential backdoors.

Chat is not available.

Invited Talk in Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly

Decoding Backdoors in LLMs and Their Implications

Bo Li

Invited Talk
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly