Self-Supervised Visual Representation Learning from Hierarchical Grouping
Xiao Zhang, Michael Maire
Spotlight presentation: Orals & Spotlights Track 12: Vision Applications
on 2020-12-08T19:00:00-08:00 - 2020-12-08T19:10:00-08:00
on 2020-12-08T19:00:00-08:00 - 2020-12-08T19:10:00-08:00
Poster Session 3 (more posters)
on 2020-12-08T21:00:00-08:00 - 2020-12-08T23:00:00-08:00
GatherTown: Vision ( Town E0 - Spot A0 )
on 2020-12-08T21:00:00-08:00 - 2020-12-08T23:00:00-08:00
GatherTown: Vision ( Town E0 - Spot A0 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. A small supervised dataset suffices for training this grouping primitive. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.