Poster
Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization
Chaobing Song · Yong Jiang · Yi Ma
Poster Session 5 #1492
Keywords: [ Applications ] [ Privacy, Anonymity, and Security ] [ Algorithms -> Boosting and Ensemble Methods; Applications -> Hardware and Systems; Applications ] [ Natural Language Processing; ]
Abstract:
In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In the general convex and smooth setting, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, VRADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly convex and smooth setting, VRADA matches the lower bound in the regime $n \le \Theta(\kappa)$, while it improves the rate in the regime $n\gg \kappa$ to $O\big(n +\frac{n\log(1/\epsilon)}{\log(n/\kappa)}\big)$, where $\kappa$ is the condition number. Besides improving the best known complexity results, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.
Chat is not available.