Poster
in
Workshop: Regulatable ML: Towards Bridging the Gaps between Machine Learning Research and Regulations
Is EMA Robust? Examining the Robustness of Data Auditing and a Novel Non-calibration Extension
Ayush Alag · Yangsibo Huang · Kai Li
Abstract:
Auditing data usage in machine learning models is crucial for regulatory compliance, especially with sensitive data like medical records. In this study, we scrutinize potential vulnerabilities within an acknowledged baseline method, Ensembled Membership Auditing (EMA), which employs membership inference attacks to determine if a specific model was trained using a particular dataset. We discover a novel False Negative Error Pattern in EMA when applied to large datasets, under adversarial methods like dropout, model pruning, and MemGuard. Our analysis across three datasets shows that larger convolutional models pose a greater challenge for EMA, but a novel metric-set analysis improves performance by up to $5\%$. To extend the applicability of our improvements, we introduce EMA-Zero, a GAN-based dataset auditing method that does not require an external calibration dataset. Notably, EMA-Zero performs comparably to EMA with synthetic calibration data trained on as few as 100 samples.
Chat is not available.