The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization
暂无分享,去创建一个
[1] Murat A. Erdogdu,et al. High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation , 2022, NeurIPS.
[2] James Martens,et al. Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers , 2022, ICLR.
[3] Edward J. Hu,et al. Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer , 2022, ArXiv.
[4] Samuel S. Schoenholz,et al. Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping , 2021, ArXiv.
[5] Daniel A. Roberts,et al. The Principles of Deep Learning Theory , 2021, ArXiv.
[6] Sebastian Nowozin,et al. Precise characterization of the prior predictive distribution of deep ReLU networks , 2021, NeurIPS.
[7] Daniel M. Roy,et al. The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization , 2021, NeurIPS.
[8] Jacob A. Zavatone-Veth,et al. Asymptotics of representation learning in finite Bayesian neural networks , 2021, NeurIPS.
[9] Jared Tanner,et al. Activation function design for deep networks: linearity and effective initialisation , 2021, Applied and Computational Harmonic Analysis.
[10] Cengiz Pehlevan,et al. Exact marginal prior distributions of finite Bayesian neural networks , 2021, NeurIPS.
[11] Andrea Montanari,et al. Deep learning: a statistical viewpoint , 2021, Acta Numerica.
[12] G. Kutyniok,et al. Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory? , 2020, MSML.
[13] Greg Yang,et al. Feature Learning in Infinite-Width Neural Networks , 2020, ArXiv.
[14] A. Doucet,et al. Stable ResNet , 2020, AISTATS.
[15] John Wright,et al. Deep Networks and the Multiple Manifold Problem , 2020, ICLR.
[16] Greg Yang,et al. Tensor Programs II: Neural Tangent Kernel for Any Architecture , 2020, ArXiv.
[17] Taiji Suzuki,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.
[18] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[19] Yuan Cao,et al. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.
[20] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2019, ICLR.
[21] Sho Yaida,et al. Non-Gaussian processes and neural networks at finite widths , 2019, MSML.
[22] Boris Hanin,et al. Finite Depth and Width Corrections to the Neural Tangent Kernel , 2019, ICLR.
[23] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[24] Arnaud Doucet,et al. On the Impact of the Activation Function on Deep Neural Networks Training , 2019, ICML.
[25] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[26] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[27] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[28] M. Nica,et al. Products of Many Large Random Matrices and Gradients in Deep Neural Networks , 2018, Communications in Mathematical Physics.
[29] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[30] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[31] Jaehoon Lee,et al. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.
[32] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.
[33] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[34] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[35] Konstantinos Spiliopoulos,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[36] Grant M. Rotskoff,et al. Trainability and Accuracy of Artificial Neural Networks: An Interacting Particle System Approach , 2018, Communications on Pure and Applied Mathematics.
[37] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[38] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[39] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[40] Jeffrey Pennington,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[41] Trevor Campbell,et al. Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..
[42] Andy R. Terrel,et al. SymPy: Symbolic computing in Python , 2017, PeerJ Prepr..
[43] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[46] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.
[47] S. Ethier,et al. Markov Processes: Characterization and Convergence , 2005 .
[48] B. Hanin. Correlation Functions in Random Fully Connected Neural Networks at Finite Width , 2022, ArXiv.
[49] O. Kallenberg. Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.
[50] Heng Huang,et al. On the Random Conjugate Kernel and Neural Tangent Kernel , 2021, ICML.
[51] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.
[52] W. Marsden. I and J , 2012 .
[53] Xiongzhi Chen. Brownian Motion and Stochastic Calculus , 2008 .
[54] Neil Genzlinger. A. and Q , 2006 .
[55] Martin Raič,et al. Normal Approximation by Stein ’ s Method , 2003 .
[56] Radford M. Neal. Bayesian learning for neural networks , 1995 .
[57] D. W. Stroock,et al. Multidimensional Diffusion Processes , 1979 .