Poster
in
Workshop: 4th Workshop on Self-Supervised Learning: Theory and Practice
Posterior Sampling on Simsiam: Rethinking Optimization in Siamese Self-Supervised Learning
Daniel De Mello · Ruqi Zhang · Bruno Ribeiro
Chen & He (2020) states that self-supervised pre-training can be performed without contrastive learning (CL) (i.e., using negative pairs). Rather, the proposed approach (SimSiam) merely maximizes the similarity between two transformations of the same example. Interestingly, even though a global optimum for this task is to collapse SimSiam into a constant function ignoring input, Chen & He (2020) argues that, in practice, the training converges to non-global optima yielding useful representations of the input. A key component is a stop-gradient (SG) operation which, if not used, causes SimSiam to quickly collapse to the global optimum. In this work, we investigate whether SG is genuinely indispensable or if satisfactory outcomes can be achieved by better exploring the loss landscape. Namely, we keep the loss landscape intact by not changing SimSiam's architecture, and explore it with SGHMC (Chen et al., 2014), a sampling method known for efficiently covering distant regions of the posterior distribution. Our empirical finding is that the proposed samples of the posterior never reach collapsed points for properly chosen step-sizes of SGHMC, indicating a large room for future optimization methods other than SG that could avoid collapse. Although SGHMC turns out not as effective as SG for improving accuracy in the downstream task, we believe our results beg more investigation about the actual necessity of SG.