Self-Supervised Relational Reasoning for Representation Learning
Massimiliano Patacchiola, Amos Storkey
Spotlight presentation: Orals & Spotlights Track 27: Unsupervised/Probabilistic
on 2020-12-10T07:00:00-08:00 - 2020-12-10T07:10:00-08:00
on 2020-12-10T07:00:00-08:00 - 2020-12-10T07:10:00-08:00
Poster Session 6 (more posters)
on 2020-12-10T09:00:00-08:00 - 2020-12-10T11:00:00-08:00
GatherTown: Learning with limited supervision ( Town D3 - Spot D3 )
on 2020-12-10T09:00:00-08:00 - 2020-12-10T11:00:00-08:00
GatherTown: Learning with limited supervision ( Town D3 - Spot D3 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on a set of unlabeled data. The aim is to build useful representations that can be used in downstream tasks, without costly manual annotation. In this work, we propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data. Training a relation head to discriminate how entities relate to themselves (intra-reasoning) and other entities (inter-reasoning), results in rich and descriptive representations in the underlying neural network backbone, which can be used in downstream tasks such as classification and image retrieval. We evaluate the proposed method following a rigorous experimental procedure, using standard datasets, protocols, and backbones. Self-supervised relational reasoning outperforms the best competitor in all conditions by an average 14% in accuracy, and the most recent state-of-the-art model by 3%. We link the effectiveness of the method to the maximization of a Bernoulli log-likelihood, which can be considered as a proxy for maximizing the mutual information, resulting in a more efficient objective with respect to the commonly used contrastive losses.