Poster
in
Workshop: OPT 2023: Optimization for Machine Learning
Revisiting Random Weight Perturbation for Efficiently Improving Generalization
Tao Li · Weihao weihao · Qinghua Tao · Zehao Lei · Yingwen Wu · Kun Fang · Mingzhen He · Xiaolin Huang
Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental problem in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). Although RWP has advantages in training time and is closely linked to AWP on a mathematical basis, its empirical performance always lags behind that of AWP. In this paper, we revisit RWP and analyze its convergence properties. We find that RWP requires a much larger perturbation magnitude than AWP, which leads to convergence issues. To resolve this, we propose m-RWP that incorporates the original loss objective to aid convergence, significantly lifting the performance of RWP. Compared with SAM, m-RWP is more efficient since it enables parallel computing of the two gradient steps and faster convergence, with comparable or even better performance. We will release the code for reproducibility.