暂无分享,去创建一个
[1] Shang-Hua Teng,et al. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.
[2] Ohad Shamir,et al. Learning Kernel-Based Halfspaces with the 0-1 Loss , 2011, SIAM J. Comput..
[3] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[4] E. Hille,et al. Contributions to the theory of Hermitian series. II. The representation problem , 1940 .
[5] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[6] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[7] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[8] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[9] Arnaud Doucet,et al. On the Impact of the Activation Function on Deep Neural Networks Training , 2019, ICML.
[10] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[11] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[12] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[13] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[14] R. A. Silverman,et al. Special functions and their applications , 1966 .
[15] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[16] Boris Hanin,et al. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? , 2018, NeurIPS.
[17] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[18] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[19] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[20] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[21] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[22] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[23] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[24] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[25] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[26] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[27] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[28] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.
[29] John P. Boyd,et al. Asymptotic coefficients of hermite function series , 1984 .
[30] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[31] J. Dicapua. Chebyshev Polynomials , 2019, Fibonacci and Lucas Numbers With Applications.
[32] T. Sanders,et al. Analysis of Boolean Functions , 2012, ArXiv.
[33] Mu Li,et al. Revise Saturated Activation Functions , 2016, ArXiv.
[34] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[35] Marcus Gallagher,et al. Invariance of Weight Distributions in Rectified MLPs , 2017, ICML.
[36] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[37] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[38] H. Weyl. Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .
[39] Iryna Gurevych,et al. Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks , 2018, EMNLP.
[40] C. R. Rao,et al. SOLUTIONS TO SOME FUNCTIONAL EQUATIONS AND THEIR APPLICATIONS TO CHARACTERIZATION OF PROBABILITY DISTRIBUTIONS , 2016 .
[41] Gábor Lugosi,et al. Concentration Inequalities , 2008, COLT.
[42] Zhenyu Liao,et al. A Random Matrix Approach to Neural Networks , 2017, ArXiv.
[43] Noboru Murata,et al. Neural Network with Unbounded Activation Functions is Universal Approximator , 2015, 1505.03654.
[44] Joan Bruna,et al. On the Expressive Power of Deep Polynomial Neural Networks , 2019, NeurIPS.
[45] Stephen Marshall,et al. Activation Functions: Comparison of trends in Practice and Research for Deep Learning , 2018, ArXiv.
[46] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[47] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[48] T. J. Rivlin. The Chebyshev polynomials , 1974 .
[49] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[50] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.
[51] Mikhail Belkin,et al. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.
[52] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[53] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[54] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[55] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[56] S. Thangavelu. Lectures on Hermite and Laguerre expansions , 1993 .
[57] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .