暂无分享,去创建一个
Tuo Zhao | Enlu Zhou | Tianyi Liu | Zhehui Chen | T. Zhao | Enlu Zhou | Tianyi Liu | Zhehui Chen
[1] Desmond J. Higham,et al. Numerical Methods for Ordinary Differential Equations - Initial Value Problems , 2010, Springer undergraduate mathematics series.
[2] Junwei Lu,et al. Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).
[3] Junwei Lu,et al. Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.
[4] Serik Sagitov. Weak Convergence of Probability Measures , 2020 .
[5] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).
[6] V. Borkar. Stochastic approximation with two time scales , 1997 .
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Shang Wu,et al. Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms , 2017, J. Mach. Learn. Res..
[9] Brian David Nowakowski. On Multi-parameter Semimartingales, Their Integrals and Weak Convergence , 2013 .
[10] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[11] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[12] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[13] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[14] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[15] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[17] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[18] Peter L. Bartlett,et al. Acceleration and Averaging in Stochastic Mirror Descent Dynamics , 2017, 1707.06219.
[19] Lin F. Yang,et al. Dropping Convexity for More Efficient and Scalable Online Multiview Learning , 2017, 1702.08134.
[20] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[21] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[22] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[23] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[24] Tuo Zhao,et al. Online Multiview Representation Learning: Dropping Convexity for Better Efficiency , 2017, ArXiv.
[25] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[26] H. Robbins. A Stochastic Approximation Method , 1951 .
[27] Erkki Oja,et al. Principal components, minor components, and linear neural networks , 1992, Neural Networks.
[28] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[29] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[30] Han Liu,et al. Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes , 2018, NIPS.
[31] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[32] Georgios Piliouras,et al. Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points , 2016, ArXiv.
[33] S. Shreve,et al. Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.
[34] T. Poggio,et al. Memo No . 067 June 27 , 2017 Theory of Deep Learning III : Generalization Properties of SGD , 2017 .
[35] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[36] Silvere Bonnabel,et al. Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.
[37] Terence D. Sanger,et al. Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.
[38] E. Oja,et al. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .
[39] Le Song,et al. Deep Hyperspherical Learning , 2017, NIPS.
[40] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.