Skip to yearly menu bar Skip to main content


Poster

Measuring Dejavu Memorization Efficiently

Narine Kokhlikyan · Bargav Jayaraman · Florian Bordes · Chuan Guo · Kamalika Chaudhuri

[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Recent work has shown that representation learning models may memorize theirtraining data, calling into question their generalization abilities. As a concreteexample, the déjà vu method shows that for certain representation learning modelsand training images, it is sometimes possible to correctly predict the foregroundlabel given only the representation of the background – better than through dataset-level correlations. However, their measurement method requires training twomodels – one to estimate dataset-level correlations and the second to estimatememorization. This two models setup makes it infeasible to evaluate memorizationin large open-source representation learning models. In this work, we proposealternative simple methods to estimate dataset-level correlations, and show thatthese can be used to approximate an off-the-shelf model’s memorization abilitywithout having to perform any retraining. We apply our technique to several open-source image representation-learning and vision-language models. Our resultsshow that different ways of measuring memorization yield very similar aggregateresults, and that open-source models typically have lower aggregate memorizationthan similar models trained on a subset of the data.

Live content is unavailable. Log in and register to view live content