How to Escape Saddle Points Efficiently
暂无分享,去创建一个
Michael I. Jordan | Sham M. Kakade | Praneeth Netrapalli | Rong Ge | Chi Jin | S. Kakade | Chi Jin | Praneeth Netrapalli | Rong Ge
[1] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[2] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[3] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[4] Vladimír Lacko,et al. On decompositional algorithms for uniform sampling from n-spheres and n-balls , 2010, J. Multivar. Anal..
[5] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[6] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[7] Prateek Jain,et al. Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.
[8] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[9] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.
[10] Prateek Jain,et al. Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.
[11] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[12] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[13] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.
[14] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).
[15] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[16] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[17] John D. Lafferty,et al. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.
[18] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[19] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[20] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[21] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.
[22] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[23] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[24] Tengyu Ma,et al. Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.
[25] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[26] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.
[27] Anastasios Kyrillidis,et al. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.
[28] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[29] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..