On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics
暂无分享,去创建一个
Xi Chen | Simon S. Du | Xin T. Tong | S. Du | Xi Chen | X. T. Tong
[1] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[2] Yaodong Yu,et al. Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima , 2018, NeurIPS.
[3] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.
[4] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[5] Ahn,et al. Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .
[6] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[7] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[8] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[9] Michael I. Jordan,et al. On Nonconvex Optimization for Machine Learning , 2019, J. ACM.
[10] Arnak S. Dalalyan,et al. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.
[11] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[12] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[13] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[14] Hermann Ney,et al. A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Eric Moulines,et al. Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo , 2017, COLT.
[16] Arnak S. Dalalyan,et al. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.
[17] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.
[18] Jonathan C. Mattingly,et al. Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise , 2002 .
[19] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[20] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[21] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[22] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.
[23] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[24] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[25] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[26] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.
[27] Kai Zheng,et al. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints , 2017, COLT.
[28] Tianbao Yang,et al. First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.
[29] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.
[30] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[31] Suvrit Sra,et al. A Critical View of Global Optimality in Deep Learning , 2018, ArXiv.
[32] Yuanzhi Li,et al. Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.
[33] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[34] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[35] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[36] Jing Dong,et al. Replica Exchange for Non-Convex Optimization , 2020, J. Mach. Learn. Res..
[37] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[38] Andrea Montanari,et al. Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality , 2017, COLT.
[39] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.
[40] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[41] É. Moulines,et al. Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.
[42] Alexander J. Smola,et al. Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.
[43] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[44] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[45] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[46] Maxim Raginsky,et al. Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability , 2018, COLT.
[47] Anastasios Kyrillidis,et al. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.
[48] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[49] Zhouchen Lin,et al. Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.
[50] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[51] Andrew M. Stuart,et al. Inverse problems: A Bayesian perspective , 2010, Acta Numerica.
[52] Marcin Andrychowicz,et al. Neural Random Access Machines , 2015, ERCIM News.
[53] Quoc V. Le,et al. Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.
[54] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[55] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.
[56] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[57] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[58] Sébastien Bubeck,et al. Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.