Skip to yearly menu bar Skip to main content


Poster

Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

Myeongseob Ko · Henry Li · Zhun Wang · Jonathan Patsenker · Jiachen (Tianhao) Wang · Qinbin Li · Ming Jin · Dawn Song · Ruoxi Jia

[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. However, this often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. Driven by these concerns, machine unlearning has become crucial to effectively purge undesirable knowledge from models. While existing literature has studied various unlearning techniques, these often suffer from either the quality of unlearning or the degradation in text-image alignment after unlearning due to the competitive nature of these objectives. To address these challenges, we first propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives and further derive the characterization of such an update. In addition, we design procedures to strategically diversify the unlearning and remaining datasets to boost performance improvement. Our evaluation demonstrates that our method effectively removes diverse target classes from recent diffusion-based generative models and concepts from stable diffusion models, while maintaining close alignment with the models' original trained states, thus outperforming state-of-the-art baselines.

Live content is unavailable. Log in and register to view live content