Convergence of gradient descent for deep neural networks
暂无分享,去创建一个
[1] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[2] Mihai Anitescu,et al. Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..
[3] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[4] Mikhail Belkin,et al. Basis Learning as an Algorithmic Primitive , 2014, COLT.
[5] Andrea Montanari,et al. Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.
[6] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[7] John C. Duchi,et al. Lower bounds for non-convex stochastic optimization , 2019, Mathematical Programming.
[8] Y. Nesterov,et al. Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..
[9] Zheng Xu,et al. The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent , 2019, ICML.
[10] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[11] Samy Bengio,et al. Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.
[12] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..
[13] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[14] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..
[15] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[16] Katta G. Murty,et al. Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..
[17] Mikhail Belkin,et al. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , 2021, Acta Numerica.
[18] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[19] Xiaodong Li,et al. Phase Retrieval from Coded Diffraction Patterns , 2013, 1310.3240.
[20] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[21] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[22] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[23] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[24] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[25] Xiaodong Li,et al. Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization , 2016, Applied and Computational Harmonic Analysis.
[26] Hédy Attouch,et al. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..
[27] A. Ioffe. Metric regularity and subdifferential calculus , 2000 .
[28] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[29] Hui Zhang,et al. Gradient methods for convex minimization: better rates under weaker conditions , 2013, ArXiv.
[30] Philip M. Long,et al. Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks , 2018, Neural Computation.
[31] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[32] Feng Ruan,et al. Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..
[33] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[34] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[35] Sanjeev Arora,et al. Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.
[36] Arnulf Jentzen,et al. On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks , 2021, ArXiv.
[37] Xiaodong Li,et al. Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.
[38] Mikhail Belkin,et al. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks , 2020, Applied and Computational Harmonic Analysis.
[39] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[40] Yuxin Chen,et al. Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.
[41] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.
[42] K. Kurdyka. On gradients of functions definable in o-minimal structures , 1998 .
[43] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[44] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[45] Dmitriy Drusvyatskiy,et al. Curves of Descent , 2012, SIAM J. Control. Optim..
[46] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[47] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[48] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[49] Andrea Montanari,et al. Deep learning: a statistical viewpoint , 2021, Acta Numerica.
[50] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.
[51] Ruoyu Sun,et al. Optimization for deep learning: theory and algorithms , 2019, ArXiv.
[52] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[53] Eduard A. Gorbunov,et al. Recent Theoretical Advances in Non-Convex Optimization , 2020, ArXiv.
[54] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[55] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[56] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[57] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..
[58] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[59] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[60] Praneeth Netrapalli,et al. Stochastic Gradient Descent and Its Variants in Machine Learning , 2019, Journal of the Indian Institute of Science.
[61] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[62] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[63] Martin J. Wainwright,et al. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.
[64] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.