Poster
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Abhimanyu Hans · John Kirchenbauer · Yuxin Wen · Neel Jain · Hamid Kazemi · Prajwal Singhania · Siddharth Singh · Gowthami Somepalli · Jonas Geiping · Abhinav Bhatele · Tom Goldstein
A growing body of work has shown that large language models memorize a portion of their training data and can reproduce this training data verbatim at inference time. This observation has become a key issue for the community as it poses major privacy risks for data owners, and exposes companies to legal risks of copyright infringement claims. To mitigate training data exposure without sacrificing model performance, we introduce a simple but subtle modification to the standard next-token prediction objective for autoregressive LLMs that we call the goldfish loss. During training, a fraction of the tokens in each training data sequence are excluded from the loss computation such that the model is not supervised to predict those tokens. Later, when generating text autoregressively, these dropped tokens inhibit the verbatim reproduction of the complete chain of tokens in the training sequence. We run extensive experiments training billion-scale parameter Llama-2 models trained from scratch and demonstrate significant reductions in extractable sequences with little to no impact on validation perplexity or downstream benchmarks.
Live content is unavailable. Log in and register to view live content