Poster
Loss Landscape Characterization of Neural Networks without Over-Parametrization
Rustem Islamov · Niccolò Ajroldi · Antonio Orvieto · Aurelien Lucchi
West Ballroom A-D #6008
Modern machine learning heavily depends on the effectiveness of optimization techniques. While deep learning models have achieved remarkable empirical results in training, their theoretical underpinnings remain somewhat elusive. Ensuring the convergence of optimization methods requires imposing specific structures on the objective function which often do not hold in practice. One prominent example is the widely recognized Polyak-Lojasiewicz (PL) inequality, which has garnered considerable attention in recent years. However, validating such assumptions for deep neural networks entails substantial and often impractical levels of over-parametrization. In order to address this limitation, we propose a novel class of functions that can characterize the loss landscape of modern deep models without requiring extensive over-parametrization and can also include saddle points. Crucially, we prove that gradient-based optimizers possess theoretical guarantees of convergence under this assumption. Finally, we validate the soundness of our assumption through both theoretical analysis and empirical experimentation across a diverse range of deep learning models.