Rethinking Pre-training and Self-training
Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Dogus Cubuk, Quoc V Le
Oral presentation: Orals & Spotlights Track 12: Vision Applications
on 2020-12-08T18:15:00-08:00 - 2020-12-08T18:30:00-08:00
on 2020-12-08T18:15:00-08:00 - 2020-12-08T18:30:00-08:00
Poster Session 3 (more posters)
on 2020-12-08T21:00:00-08:00 - 2020-12-08T23:00:00-08:00
GatherTown: Deep Learning/Limited Supervision ( Town D1 - Spot B2 )
on 2020-12-08T21:00:00-08:00 - 2020-12-08T23:00:00-08:00
GatherTown: Deep Learning/Limited Supervision ( Town D1 - Spot B2 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a striking result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 53.8AP, an improvement of +1.7AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5mIOU, an improvement of +1.5mIOU over the previous state-of-the-art result by DeepLabv3+.