NeurIPS DoG is SGD’s best friend: toward tuning-free stochastic optimization, Yair Carmon

Plenary speaker
in
Workshop: OPT 2023: Optimization for Machine Learning

DoG is SGD’s best friend: toward tuning-free stochastic optimization, Yair Carmon

Yair Carmon

[ Abstract ]

Abstract:

Abstract: While stochastic optimization methods drive continual improvements in machine learning, choosing the optimization parameters—and particularly the learning rate (LR)—remains a difficulty. In this talk, I will describe our work on removing LR tuning from stochastic gradient descent (SGD), culminating in a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). We show that DoG removes the need to tune learning rate both theoretically (obtaining strong parameter-free convergence guarantees) and empirically (performing nearly as well as expensively-tuned SGD on neural network training tasks).

Chat is not available.

Plenary speaker in Workshop: OPT 2023: Optimization for Machine Learning

DoG is SGD’s best friend: toward tuning-free stochastic optimization, Yair Carmon

Yair Carmon

Plenary speaker
in
Workshop: OPT 2023: Optimization for Machine Learning