ING OF OVERPARAMETRIZED NEURAL NETS
暂无分享,去创建一个
[1] H. Weyl. Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .
[2] E. Hille,et al. Contributions to the theory of Hermitian series. II. The representation problem , 1940 .
[3] R. A. Silverman,et al. Special functions and their applications , 1966 .
[4] John P. Boyd,et al. Asymptotic coefficients of hermite function series , 1984 .
[5] S. Thangavelu. Lectures on Hermite and Laguerre expansions , 1993 .
[6] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[7] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[8] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .
[9] Shang-Hua Teng,et al. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.
[10] Gábor Lugosi,et al. Concentration Inequalities , 2008, COLT.
[11] R. Varga. Geršgorin And His Circles , 2004 .
[12] J. Dicapua. Chebyshev Polynomials , 2019, Fibonacci and Lucas Numbers With Applications.
[13] Rene F. Swarttouw,et al. Orthogonal polynomials , 2020, NIST Handbook of Mathematical Functions.
[14] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.
[15] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[17] Ohad Shamir,et al. Learning Kernel-Based Halfspaces with the 0-1 Loss , 2011, SIAM J. Comput..
[18] T. Sanders,et al. Analysis of Boolean Functions , 2012, ArXiv.
[19] Mikhail Belkin,et al. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.
[20] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[21] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[22] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[23] Mu Li,et al. Revise Saturated Activation Functions , 2016, ArXiv.
[24] C. R. Rao,et al. SOLUTIONS TO SOME FUNCTIONAL EQUATIONS AND THEIR APPLICATIONS TO CHARACTERIZATION OF PROBABILITY DISTRIBUTIONS , 2016 .
[25] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[26] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[27] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[28] Zhenyu Liao,et al. A Random Matrix Approach to Neural Networks , 2017, ArXiv.
[29] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[30] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[31] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[32] Boris Hanin,et al. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? , 2018, NeurIPS.
[33] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[34] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[35] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[36] Stephen Marshall,et al. Activation Functions: Comparison of trends in Practice and Research for Deep Learning , 2018, ArXiv.
[37] Marcus Gallagher,et al. Invariance of Weight Distributions in Rectified MLPs , 2017, ICML.
[38] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[39] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.
[40] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[41] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[42] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[43] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[44] Iryna Gurevych,et al. Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks , 2018, EMNLP.
[45] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[46] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[47] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[48] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[49] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[50] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[51] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[52] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[53] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[54] Joan Bruna,et al. On the Expressive Power of Deep Polynomial Neural Networks , 2019, NeurIPS.
[55] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[56] Arnaud Doucet,et al. On the Impact of the Activation Function on Deep Neural Networks Training , 2019, ICML.
[57] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.