暂无分享,去创建一个
[1] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[2] Quynh Nguyen,et al. On Connected Sublevel Sets in Deep Learning , 2019, ICML.
[3] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[4] Dawei Li,et al. On the Benefit of Width for Neural Networks: Disappearance of Basins , 2018, SIAM J. Optim..
[5] Yuan Cao,et al. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.
[6] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[7] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[8] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[9] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[10] Taiji Suzuki,et al. Refined Generalization Analysis of Gradient Descent for Over-parameterized Two-layer Neural Networks with Smooth Activations on Classification Problems , 2019, ArXiv.
[11] Matthias Hein,et al. On the loss landscape of a class of deep neural networks with no bad local valleys , 2018, ICLR.
[12] Xin Yang,et al. Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound , 2019, ArXiv.
[13] Dawei Li,et al. Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations , 2018, ArXiv.
[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[15] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[16] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[17] R. Adamczak,et al. A note on the Hanson-Wright inequality for random vectors with dependencies , 2014, 1409.8457.
[18] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[19] S. Bobkov,et al. Higher order concentration of measure , 2017, Communications in Contemporary Mathematics.
[20] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[21] Quanquan Gu,et al. An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.
[22] J. Dolbeault,et al. Sharp Interpolation Inequalities on the Sphere: New Methods and Consequences , 2012, 1210.1853.
[23] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[24] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Amit Daniely,et al. Neural Networks Learning and Memorization with (almost) no Over-Parameterization , 2019, NeurIPS.
[26] Quoc V. Le,et al. The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study , 2019, ICML.
[27] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[28] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[29] Jaehoon Lee,et al. On the infinite width limit of neural networks with a standard parameterization , 2020, ArXiv.
[30] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[31] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.
[32] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[33] Roman Vershynin,et al. Memory capacity of neural networks with threshold and ReLU activations , 2020, ArXiv.
[34] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[35] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .
[36] G. Stewart. Perturbation theory for the singular value decomposition , 1990 .
[37] R. Adamczak,et al. Restricted Isometry Property of Matrices with Independent Columns and Neighborly Polytopes by Random Sampling , 2009, 0904.4723.
[38] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[39] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[40] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[41] R. Adamczak,et al. Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order , 2013, 1304.1826.
[42] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[43] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..
[44] Xiaoxia Wu,et al. Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network , 2019, ArXiv.
[45] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[46] B. Mityagin. The Zero Set of a Real Analytic Function , 2015, Mathematical Notes.
[47] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..
[48] Rong Ge,et al. Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently , 2019, ArXiv.
[49] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[50] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[51] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[52] Suvrit Sra,et al. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity , 2018, NeurIPS.
[53] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[54] Daniel W. Stroock,et al. Moment estimates derived from Poincar'e and log-arithmic Sobolev inequalities , 1994 .
[55] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.