Poster
in
Workshop: 4th Workshop on Self-Supervised Learning: Theory and Practice
On Improving the Sample Efficiency of Non-Contrastive SSL
Kumar Krishna Agrawal · Arna Ghosh · Adam Oberman · Blake Richards
In this work, we provide theoretical insights on the implicit bias of the BarlowTwins and VICReg loss that can explain these heuristics and guide the development of more principled recommendations. Our first insight is that the orthogonality of the features is more important than projector dimensionality for learning good representations. Based on this, we empirically demonstrate that low-dimensional projector heads are sufficient with appropriate regularization, contrary to the existing heuristic. Our second theoretical insight suggests that using multiple data augmentations better represents the desiderata of the SSL objective. Based on this, we demonstrate that leveraging more augmentations per sample improves representation quality and trainability. In particular, it improves optimization convergence, leading to better features emerging earlier in the training. Remarkably, we demonstrate that we can reduce the pretraining dataset size by up to 4x while maintaining accuracy and improving convergence simply by using more data augmentations. Combining these insights, we present pretraining recommendations that improve wall-clock time by 2x and downstream performance on CIFAR-10/STL-10 datasets.