论文信息 - Lower Bounds for Smooth Nonconvex Finite-Sum Optimization - 字舞流文

Lower Bounds for Smooth Nonconvex Finite-Sum Optimization

Smooth finite-sum optimization has been widely studied in both convex and nonconvex settings. However, existing lower bounds for finite-sum optimization are mostly limited to the setting where each component function is (strongly) convex, while the lower bounds for nonconvex finite-sum optimization remain largely unsolved. In this paper, we study the lower bounds for smooth nonconvex finite-sum optimization, where the objective function is the average of $n$ nonconvex component functions. We prove tight lower bounds for the complexity of finding $\epsilon$-suboptimal point and $\epsilon$-approximate stationary point in different settings, for a wide regime of the smallest eigenvalue of the Hessian of the objective function (or each component function). Given our lower bounds, we can show that existing algorithms including KatyushaX (Allen-Zhu, 2018), Natasha (Allen-Zhu, 2017), RapGrad (Lan and Yang, 2018) and StagewiseKatyusha (Chen and Yang, 2018) have achieved optimal Incremental First-order Oracle (IFO) complexity (i.e., number of IFO calls) up to logarithm factors for nonconvex finite-sum optimization. We also point out potential ways to further improve these complexity results, in terms of making stronger assumptions or by a different convergence analysis.

Quanquan Gu | Dongruo Zhou | Quanquan Gu | Dongruo Zhou

[1] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2] Andrew Chi-Chih Yao,et al. Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[3] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[4] Ohad Shamir,et al. Dimension-Free Iteration Complexity of Finite Sum Optimization Problems , 2016, NIPS.

[5] Quanquan Gu,et al. Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[6] Katta G. Murty,et al. Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[7] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[8] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[9] Julien Mairal,et al. Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.

[10] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[11] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[12] Zeyuan Allen-Zhu,et al. Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[13] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[14] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[15] Yair Carmon,et al. Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.

[16] Naman Agarwal,et al. Lower Bounds for Higher-Order Convex Optimization , 2017, COLT.

[17] Sham M. Kakade,et al. Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[18] Zeyuan Allen Zhu,et al. Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter , 2017, ArXiv.

[19] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[20] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[22] Shai Shalev-Shwartz,et al. SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[23] Guanghui Lan,et al. Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization , 2018, 1805.05411.

[24] Yuanzhi Li,et al. Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[25] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[26] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[27] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[28] Shai Shalev-Shwartz,et al. SDCA without Duality , 2015, ArXiv.

[29] Ohad Shamir,et al. Oracle complexity of second-order methods for smooth convex optimization , 2017, Mathematical Programming.

[30] Zeyuan Allen-Zhu,et al. Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization , 2018, ICML.

[31] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[32] Tianbao Yang,et al. Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number , 2019, ICML.