Invited Talk
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly
Decoding Backdoors in LLMs and Their Implications
Bo Li
In the rapidly evolving landscape of artificial intelligence, generative AI has emerged as a powerful and transformative technology with significant potential across various applications, such as medical, financial, and autonomous driving. However, with this immense potential comes the imperative to ensure the safety and trustworthiness of generative models before their large-scale deployment.
In particular, as large language models (LLMs) become increasingly prevalent in real-world applications, understanding and mitigating the risks associated with potential backdoors is paramount. This talk will delve into the critical examination of backdoors embedded in LLMs and explore their potential implications on the security and reliability of these models in different applications. Specifically, I will first talk about different strategies for injecting backdoors in LLMs and a series of CoT frameworks. I will then discuss potential defenses against known and unknown backdoors in LLM. I will provide an overview of how to assess, improve, and certify the resilience of LLMs against potential backdoors.