First-order methods almost always avoid saddle points: The case of vanishing step-sizes
暂无分享,去创建一个
Michael I. Jordan | Max Simchowitz | Benjamin Recht | Georgios Piliouras | Jason D. Lee | Ioannis Panageas | B. Recht | J. Lee | Ioannis Panageas | G. Piliouras | Max Simchowitz
[1] Philip E. Gill,et al. Newton-type methods for unconstrained and linearly constrained optimization , 1974, Math. Program..
[2] Danny C. Sorensen,et al. On the use of directions of negative curvature in a modified newton method , 1979, Math. Program..
[3] M. Shub. Global Stability of Dynamical Systems , 1986 .
[4] Katta G. Murty,et al. Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..
[5] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .
[6] L. Perko. Differential Equations and Dynamical Systems , 1991 .
[7] P. Mikusinski,et al. An Introduction to Multivariable Analysis from Vector to Manifold , 2001 .
[8] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[9] Robert E. Mahony,et al. Optimization Algorithms on Matrix Manifolds , 2007 .
[10] L. Barreira,et al. Stability Of Nonautonomous Differential Equations , 2007 .
[11] Adrian S. Lewis,et al. Alternating Projections on Manifolds , 2008, Math. Oper. Res..
[12] Martin Rasmussen,et al. Computation of nonautonomous invariant and inertial manifolds , 2009, Numerische Mathematik.
[13] Éva Tardos,et al. Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.
[14] Andrea Montanari,et al. Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.
[15] Antonio Auffinger,et al. Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.
[16] Jérôme Malick,et al. Projection-like Retractions on Matrix Manifolds , 2012, SIAM J. Optim..
[17] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..
[18] Robert E. Mahony,et al. An Extrinsic Look at the Riemannian Hessian , 2013, GSI.
[19] A. Latif. Banach Contraction Principle and Its Generalizations , 2014 .
[20] Surya Ganguli,et al. On the saddle point problem for non-convex optimization , 2014, ArXiv.
[21] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.
[22] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[23] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.
[24] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[25] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..
[26] Ruta Mehta,et al. Natural Selection as an Inhibitor of Genetic Diversity: Multiplicative Weights Updates Algorithm and a Conjecture of Haploid Genetics [Working Paper Abstract] , 2014, ITCS.
[27] T. Zhao,et al. Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle , 2015 .
[28] Sanjeev Arora,et al. Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.
[29] Xiaodong Li,et al. Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.
[30] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[31] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[32] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.
[33] John Wright,et al. When Are Nonconvex Problems Not Scary? , 2015, ArXiv.
[34] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).
[35] Mikhail Belkin,et al. Basis Learning as an Algorithmic Primitive , 2014, COLT.
[36] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[37] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[38] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[39] Georgios Piliouras,et al. Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction , 2014, EC.
[40] Georgios Piliouras,et al. Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.
[41] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[42] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[43] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[44] Georgios Piliouras,et al. Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.
[45] John Wright,et al. Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.
[46] Mingrui Liu,et al. On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization , 2017, 1709.08571.
[47] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[48] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[49] S. Shankar Sastry,et al. Step Size Matters in Deep Learning , 2018, NeurIPS.
[50] Stephen J. Wright,et al. Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..
[51] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.
[52] Constantinos Daskalakis,et al. The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.
[53] G. Piliouras,et al. Family of chaotic maps from game theory , 2018, 1807.06831.
[54] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[55] G. Piliouras,et al. The route to chaos in routing games: Population increase drives period-doubling instability, chaos & inefficiency with Price of Anarchy equal to one , 2019, ArXiv.
[56] Xiao Wang,et al. Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always , 2018, ICML.
[57] Michael I. Jordan. DYNAMICAL, SYMPLECTIC AND STOCHASTIC PERSPECTIVES ON GRADIENT-BASED OPTIMIZATION , 2019, Proceedings of the International Congress of Mathematicians (ICM 2018).
[58] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[59] Xiao Wang,et al. Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem , 2019, ICLR.
[60] Ioannis Panageas,et al. On the Analysis of EM for truncated mixtures of two Gaussians , 2019, ALT.