Wide neural networks of any depth evolve as linear models under gradient descent
暂无分享,去创建一个
Jaehoon Lee | Jascha Sohl-Dickstein | Samuel S. Schoenholz | Jeffrey Pennington | Lechao Xiao | Yasaman Bahri | J. Sohl-Dickstein | Roman Novak | Yasaman Bahri | S. Schoenholz | Jaehoon Lee | Lechao Xiao | Jascha Sohl-Dickstein | Jascha Narain Sohl-Dickstein | Jeffrey Pennington
[1] Christopher K. I. Williams. Computing with Infinite Networks , 1996, NIPS.
[2] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[3] S. Dragomir. Some Gronwall Type Inequalities and Applications , 2003 .
[4] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.
[5] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[6] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[9] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[10] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[13] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[14] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[15] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[16] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[17] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[18] A. G. D. G. Matthews,et al. Sample-then-optimize posterior sampling for Bayesian linear models , 2017 .
[19] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[20] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[21] Grant M. Rotskoff,et al. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks , 2018, NeurIPS.
[22] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[23] Justin A. Sirignano,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[24] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[25] Daniel S. Park. Optimal SGD Hyperparameters for Fully Connected Networks , 2018 .
[26] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.
[27] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[28] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[29] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[30] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.
[31] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[32] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[33] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.
[34] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[35] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[36] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[37] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[38] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[39] Laurence Aitchison,et al. Deep Convolutional Networks as shallow Gaussian Processes , 2018, ICLR.
[40] Samy Bengio,et al. Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..
[41] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[42] Quoc V. Le,et al. The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study , 2019, ICML.
[43] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[44] Jaehoon Lee,et al. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.
[45] Yuanzhi Li,et al. On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.
[46] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[47] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[48] Konstantinos Spiliopoulos,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[49] Jaehoon Lee,et al. Neural Tangents: Fast and Easy Infinite Neural Networks in Python , 2019, ICLR.