On Nonconvex Optimization for Machine Learning

[1]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[2]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[3]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[4]  Yair Carmon,et al.  Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.

[5]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[6]  Quanquan Gu,et al.  Stochastic Recursive Variance-Reduced Cubic Regularization Methods , 2019, AISTATS.

[7]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[8]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[9]  Anima Anandkumar,et al.  Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[10]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[11]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[12]  Yair Carmon,et al.  Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations , 2020, COLT.

[13]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[14]  Nathan Srebro,et al.  Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.

[15]  Quanquan Gu,et al.  Finding Local Minima via Stochastic Nested Variance Reduction , 2018, ArXiv.

[16]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[17]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[18]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[19]  Michael I. Jordan,et al.  First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[20]  Michael I. Jordan,et al.  Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[21]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[22]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[23]  A. Bovier,et al.  Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .

[24]  Alexander J. Smola,et al.  A Generic Approach for Escaping Saddle points , 2017, AISTATS.

[25]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[26]  Kfir Y. Levy,et al.  The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[27]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[28]  Nicolas Boumal,et al.  The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[29]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[30]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[31]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[32]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[33]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[34]  Yuanzhi Li,et al.  Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.

[35]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[36]  Yair Carmon,et al.  Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step , 2019, SIAM J. Optim..

[37]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[38]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[39]  Zhouchen Lin,et al.  Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.

[40]  Michael I. Jordan,et al.  A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.

[41]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[42]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[43]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[44]  Nicolas Boumal,et al.  On the low-rank approach for semidefinite programs arising in synchronization and community detection , 2016, COLT.

[45]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[46]  Andrea Montanari,et al.  Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality , 2017, COLT.

[47]  H. Robbins A Stochastic Approximation Method , 1951 .

[48]  Tianbao Yang,et al.  First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.

[49]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[50]  Thomas Hofmann,et al.  Escaping Saddles with Stochastic Gradients , 2018, ICML.