Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Memorization Detection Benchmark for Generative Image models

Marc Molina · Felice Burn


Abstract:

Generative models in medical imaging offer significant potential for data augmentation and privacy preservation, but they also pose risks of patient data memorization. This study presents a comprehensive, data-driven approach to evaluate and characterize the memorization behavior of generative models. We systematically compare various network architectures, loss functions, pretraining datasets, and distance metrics to identify optimal configurations for detecting potential privacy concerns in synthetic images. Our analysis reveals that self-supervised contrastive networks using Triplet Margin loss in models like DinoV2, DenseNet121, and ResNet50, when paired with Bray-Curtis or Standardized Euclidean distance metrics, demonstrate superior performance in detecting augmented copies of training images. We further apply our methodology to characterize the memorization behavior of a conditional diffusion image transformer model trained on mammography data. This work contributes a robust framework for evaluating generative models in medical imaging, offering a crucial tool for assessing the risk of patient data leakage in synthetic datasets.

Chat is not available.