暂无分享,去创建一个
[1] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[2] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[3] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[4] Shai Shalev-Shwartz,et al. On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.
[5] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[6] Tengyu Ma,et al. Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.
[7] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[8] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[9] Tony R. Martinez,et al. Iterative Non-linear Dimensionality Reduction with Manifold Sculpting , 2007, NIPS.
[10] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[11] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.
[12] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[13] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[14] A. M. Mathai,et al. Quadratic forms in random variables : theory and applications , 1992 .
[15] Inderjit S. Dhillon,et al. Rank minimization via online learning , 2008, ICML '08.
[16] G. McLachlan,et al. Extensions of the EM Algorithm , 2007 .
[17] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..
[18] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.
[19] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..
[20] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[21] Haihao Lu,et al. Depth Creates No Bad Local Minima , 2017, ArXiv.
[22] Tianbao Yang,et al. First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.
[23] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.
[24] Martin J. Wainwright,et al. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.
[25] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[26] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..
[27] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[28] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[29] Thomas Laurent,et al. Deep linear neural networks with arbitrary loss: All local minima are global , 2017, ArXiv.
[30] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[31] Yee Whye Teh,et al. Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..
[32] Z. Zabinsky. Random Search Algorithms , 2010 .
[33] Yonina C. Eldar,et al. Phase Retrieval via Matrix Completion , 2011, SIAM Rev..
[34] Chih-Jen Lin,et al. Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.
[35] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[36] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[37] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.