Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization
Chong You, Zhihui Zhu, Qing Qu, Yi Ma
Spotlight presentation: Orals & Spotlights Track 13: Deep Learning/Theory
on 2020-12-08T19:10:00-08:00 - 2020-12-08T19:20:00-08:00
on 2020-12-08T19:10:00-08:00 - 2020-12-08T19:20:00-08:00
Poster Session 3 (more posters)
on 2020-12-08T21:00:00-08:00 - 2020-12-08T23:00:00-08:00
GatherTown: Deep Learning/Limited Supervision ( Town D1 - Spot A0 )
on 2020-12-08T21:00:00-08:00 - 2020-12-08T23:00:00-08:00
GatherTown: Deep Learning/Limited Supervision ( Town D1 - Spot A0 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for {\em robust} low-rank matrix recovery from {\em grossly corrupted} measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a {\em double over-parameterization} for both the low-rank matrix and sparse corruption, gradient descent with {\em discrepant learning rates} provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption. We further extend our approach for the robust recovery of natural images by over-parameterizing images with deep convolutional networks. Experiments show that our method handles different test images and varying corruption levels with a single learning pipeline where the network width and termination conditions do not need to be adjusted on a case-by-case basis. Underlying the success is again the implicit bias with discrepant learning rates on different over-parameterized parameters, which may bear on broader applications.