Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

Michael-Andrei Panaitescu-Liess · Pankayaraj Pathmanathan · Yigitcan Kaya · Zora Che · Bang An · Sicheng Zhu · Aakriti Agrawal · Furong Huang


Abstract:

As the capabilities of large language models (LLMs) continue to expand, their usage has become increasingly prevalent. However, as reflected in numerous ongoing lawsuits related to LLM-generated content, addressing copyright infringement remains a significant challenge. In this paper, we introduce the first data poisoning attack specifically designed to induce the generation of copyrighted content by an LLM, even when the model has not been directly trained on the specific copyrighted material. We find that a straightforward attack—which integrates small fragments of copyrighted text into the poison samples—is surprisingly effective at priming the models to generate copyrighted content. Moreover, we demonstrate that current defenses are insufficient and largely ineffective against this type of attack, underscoring the need for further exploration of this emerging threat model.

Chat is not available.