Poster
in
Workshop: 5th Workshop on Self-Supervised Learning: Theory and Practice
Influence Estimation in Self-Supervised Learning
Nidhin Harilal · Reza Akbarian Bafghi · Amit Rege · Maziar Raissi · Claire Monteleoni
Self-supervised learning (SSL) has emerged as a key method for training powerful encoders on large-scale unlabeled data. However, recent research indicates that SSL encoders may over-rely on or even memorize many data points from their training set. While supervised learning benefits from tools like influence functions to identify such memorable data points, these methods do not effectively apply to SSL due to their reliance on labels. In this work, we introduce a new label-free definition of influence function for SSL. Our implementation utilizes Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) to efficiently estimate influence function without requiring any retraining, making it applicable to any pre-trained SSL model to assess the effect of training examples on its model behavior. Our results suggest that the proposed method is informative about memorization that can be detrimental to SSL pre-training.