Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
暂无分享,去创建一个
[1] 山崎 泰郎,et al. Measures on infinite dimensional spaces , 2021, Mathematical Feynman Path Integrals and Their Applications.
[2] Daniel A. Roberts,et al. The Principles of Deep Learning Theory , 2021, ArXiv.
[3] Ken-ichi Kawarabayashi,et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.
[4] Grant M. Rotskoff,et al. A Dynamical Central Limit Theorem for Shallow Neural Networks , 2020, NeurIPS.
[5] Jaehoon Lee,et al. Finite Versus Infinite Neural Networks: an Empirical Study , 2020, NeurIPS.
[6] Paris Perdikaris,et al. When and why PINNs fail to train: A neural tangent kernel perspective , 2020, J. Comput. Phys..
[7] Andrea Montanari,et al. When do neural networks outperform kernel methods? , 2020, NeurIPS.
[8] Jiaoyang Huang,et al. Dynamics of Deep Neural Networks and Neural Tangent Hierarchy , 2019, ICML.
[9] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[10] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[11] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[12] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[13] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[14] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[15] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[16] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[17] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[18] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[19] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[20] Arthur Jacot,et al. Neural Tangent Kernel: Convergence and Generalization in Neural Networks , 2018, NeurIPS.
[21] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[22] Jeffrey Pennington,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[23] R. Durrett. Probability: Theory and Examples , 1993 .
[24] E. Siebert. Weak convergence of measures , 1984 .
[25] Edward J. Hu,et al. Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks , 2021, ICML.
[26] A. Wills,et al. Physics-informed machine learning , 2021, Nature Reviews Physics.
[27] R. Bass,et al. Review: P. Billingsley, Convergence of probability measures , 1971 .
[28] E. Mammen. The Bootstrap and Edgeworth Expansion , 1997 .
[29] Radford M. Neal. Priors for Infinite Networks , 1996 .