Almost Tune-Free Variance Reduction

The variance reduction class of algorithms including the representative ones, SVRG and SARAH, have well documented merits for empirical risk minimization problems. However, they require grid search to tune parameters (step size and the number of iterations per inner loop) for optimal performance. This work introduces `almost tune-free' SVRG and SARAH schemes equipped with i) Barzilai-Borwein (BB) step sizes; ii) averaging; and, iii) the inner loop length adjusted to the BB step sizes. In particular, SVRG, SARAH, and their BB variants are first reexamined through an `estimate sequence' lens to enable new averaging methods that tighten their convergence rates theoretically, and improve their performance empirically when the step size or the inner loop length is chosen large. Then a simple yet effective means to adjust the number of iterations per inner loop is developed to enhance the merits of the proposed averaging schemes and BB step sizes. Numerical tests corroborate the proposed methods.

[1]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[2]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[3]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[4]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[5]  Georgios B. Giannakis,et al.  Adaptive Step Sizes in Variance Reduction via Regularization , 2019, ArXiv.

[6]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[7]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[8]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[9]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[10]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[11]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12]  Georgios B. Giannakis,et al.  On the Convergence of SARAH and Beyond , 2019, AISTATS.

[13]  Shiqian Ma,et al.  Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.

[14]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[15]  Tiande Guo,et al.  A Class of Stochastic Variance Reduced Methods with an Adaptive Stepsize , 2019 .

[16]  Marten van Dijk,et al.  Finite-sum smooth optimization with SARAH , 2019, Computational Optimization and Applications.

[17]  Cheng Wang,et al.  Accelerating Mini-batch SARAH by Step Size Rules , 2019, Inf. Sci..

[18]  Marten van Dijk,et al.  Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH , 2019, ArXiv.

[19]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[20]  Bin Hu,et al.  Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs , 2018, ICML.

[21]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[22]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[23]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[24]  Peter Richtárik,et al.  Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.

[25]  Peter Richtárik,et al.  Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[26]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[27]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[28]  Julien Mairal,et al.  Estimate Sequences for Variance-Reduced Stochastic Composite Optimization , 2019, ICML.

[29]  H. Robbins A Stochastic Approximation Method , 1951 .

[30]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.