Lower Bounds for Smooth Nonconvex Finite-Sum Optimization

Smooth finite-sum optimization has been widely studied in both convex and nonconvex settings. However, existing lower bounds for finite-sum optimization are mostly limited to the setting where each component function is (strongly) convex, while the lower bounds for nonconvex finite-sum optimization remain largely unsolved. In this paper, we study the lower bounds for smooth nonconvex finite-sum optimization, where the objective function is the average of $n$ nonconvex component functions. We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function). Given our lower bounds, we can show that existing algorithms including KatyushaX (Allen-Zhu, 2018), Natasha (Allen-Zhu, 2017), RapGrad (Lan and Yang, 2018) and StagewiseKatyusha (Chen and Yang, 2018) have achieved optimal Incremental First-order Oracle (IFO) complexity (i.e., number of IFO calls) up to logarithm factors for nonconvex finite-sum optimization. We also point out potential ways to further improve these complexity results, in terms of making stronger assumptions or by a different convergence analysis.

[1]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[3]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[4]  Ohad Shamir,et al.  Dimension-Free Iteration Complexity of Finite Sum Optimization Problems , 2016, NIPS.

[5]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[6]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[7]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[8]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[9]  Julien Mairal,et al.  Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.

[10]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[11]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[12]  Zeyuan Allen-Zhu,et al.  Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[13]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[14]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[15]  Yair Carmon,et al.  Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.

[16]  Naman Agarwal,et al.  Lower Bounds for Higher-Order Convex Optimization , 2017, COLT.

[17]  Sham M. Kakade,et al.  Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[18]  Zeyuan Allen Zhu,et al.  Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter , 2017, ArXiv.

[19]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[20]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[22]  Shai Shalev-Shwartz,et al.  SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[23]  Guanghui Lan,et al.  Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization , 2018, 1805.05411.

[24]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[25]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[26]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[27]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[28]  Shai Shalev-Shwartz,et al.  SDCA without Duality , 2015, ArXiv.

[29]  Ohad Shamir,et al.  Oracle complexity of second-order methods for smooth convex optimization , 2017, Mathematical Programming.

[30]  Zeyuan Allen-Zhu,et al.  Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization , 2018, ICML.

[31]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[32]  Tianbao Yang,et al.  Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number , 2019, ICML.