论文信息 - Finite-sum smooth optimization with SARAH - 字舞流文

Finite-sum smooth optimization with SARAH

The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $\Omega(\sqrt{n}/\epsilon)$ for $n \leq \mathcal{O}(\epsilon^{-2})$ where $\epsilon$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq \epsilon$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(\epsilon^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

Marten van Dijk | Lam M. Nguyen | Phuong Ha Nguyen | J. Kalagnanam | Tsui-Wei Weng | D. Phan

[1] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[2] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[3] Peter Richtárik,et al. Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[4] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[5] Zeyuan Allen Zhu,et al. Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter , 2017, ArXiv.

[6] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..

[7] H. Robbins. A Stochastic Approximation Method , 1951 .

[8] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[9] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[10] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[11] Julien Mairal,et al. Optimization with First-Order Surrogate Functions , 2013, ICML.

[12] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[13] Zeyuan Allen-Zhu,et al. Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[14] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[15] Yi Zhou,et al. SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[16] Jie Liu,et al. Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[17] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[18] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[19] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.

[21] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[22] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[23] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..