Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models
Optimizing Adversarial Samples for Tighter Privacy Auditing in Final Model-Only Settings
Sangyeon Yoon · Wonje Jeung · Albert No
Abstract:
Differential Privacy (DP) offers a rigorous framework for protecting individual data during machine learning model training. Integrating DP into deep learning is commonly achieved using Differentially Private Stochastic Gradient Descent (DP-SGD). However, accurately auditing the actual privacy leakage remains challenging, especially when only the final trained model is accessible—a common scenario in practical deployments. While some previous methods have successfully performed privacy auditing in this final model-only setting, they often rely on strong additional assumptions, such as access to initial model parameters or specific training data subsets, which may not be feasible.In this work, we address the challenge of performing tighter privacy auditing without imposing any additional assumptions beyond access to the final model. We propose an input-space auditing method using loss-based membership inference attacks (MIAs) enhanced by an adversarial generator. This generator crafts an adversarial sample that maximize the difference in loss distributions between models trained with and without a canary sample, utilizing only gradients from the final model.Our approach significantly narrows the gap between theoretical privacy bounds and empirical estimates, providing a more accurate assessment of privacy leakage. Experiments on MNIST and CIFAR-10 datasets demonstrate that our method improves the tightness of privacy audits by 3.08 and 2.91, respectively, compared to canary-based methods under a privacy budget of $\epsilon=10$. These findings show that effective and tighter privacy auditing is achievable even when only the final model is available, enhancing our understanding of privacy risks in machine learning without relying on strong additional assumptions.
Chat is not available.