NeurIPS PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

Poster
in
Workshop: Safe Generative AI

PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

Michael-Andrei Panaitescu-Liess · Pankayaraj Pathmanathan · Yigitcan Kaya · Zora Che · Bang An · Sicheng Zhu · Aakriti Agrawal · Furong Huang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

As the capabilities of large language models (LLMs) continue to expand, their usage has become increasingly prevalent. However, as reflected in numerous ongoing lawsuits related to LLM-generated content, addressing copyright infringement remains a significant challenge. In this paper, we introduce the first data poisoning attack specifically designed to induce the generation of copyrighted content by an LLM, even when the model has not been directly trained on the specific copyrighted material. We find that a straightforward attack—which integrates small fragments of copyrighted text into the poison samples—is surprisingly effective at priming the models to generate copyrighted content. Moreover, we demonstrate that current defenses are insufficient and largely ineffective against this type of attack, underscoring the need for further exploration of this emerging threat model.

Chat is not available.

Poster in Workshop: Safe Generative AI

PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

Michael-Andrei Panaitescu-Liess · Pankayaraj Pathmanathan · Yigitcan Kaya · Zora Che · Bang An · Sicheng Zhu · Aakriti Agrawal · Furong Huang

Poster
in
Workshop: Safe Generative AI