Poster
in
Workshop: OPT 2021: Optimization for Machine Learning
Faster Quasi-Newton Methods for Linear Composition Problems
Betty Shea · Mark Schmidt
Despite its lack of theoretical guarantees, the limited memory version of BFGS (l-BFGS) is widely used in practice for large-scale optimization problems. In particular, when combined with the Wolfe conditions, l-BFGS tends to find solutions significantly faster than methods with known convergence rates such as gradient descent (GD). Similarly, inexact stepsize searches have outsized practical impact despite, in theory, having the same worst-case complexity as using a constant step size. These search techniques are often used for linear composition problems which are known to allow efficient linesearches and subspace optimization (SO). In this paper, we propose a version of l-BFGS that is faster for linear composition problems. Our method combines a l-BFGS direction with a momentum direction using SO. We explore practical SO issues that include extending the Wolfe conditions from one- to multi-dimension. We experimentally compare our method to standard l-BFGS and to a SO method that is known to be efficient.