On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization
暂无分享,去创建一个
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Robert C. Qiu,et al. Spectrum Concentration in Deep Residual Learning: A Free Probability Approach , 2018, IEEE Access.
[3] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[4] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[5] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[6] Richard Yi Da Xu,et al. Mean field theory for deep dropout networks: digging up gradient backpropagation deeply , 2020, ECAI.
[7] Jaehoon Lee,et al. On the infinite width limit of neural networks with a standard parameterization , 2020, ArXiv.
[8] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[9] Greg Yang,et al. Tensor Programs II: Neural Tangent Kernel for Any Architecture , 2020, ArXiv.
[10] Mark S. Nixon,et al. Feature extraction & image processing for computer vision , 2012 .
[11] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[12] S. Chatterjee,et al. MULTIVARIATE NORMAL APPROXIMATION USING EXCHANGEABLE PAIRS , 2007, math/0701464.
[13] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[14] Jiaoyang Huang,et al. Dynamics of Deep Neural Networks and Neural Tangent Hierarchy , 2019, ICML.
[15] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[16] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[17] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[18] Il Park,et al. Information Geometry of Orthogonal Initializations and Training , 2018, ICLR.
[19] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.
[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[21] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[22] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[23] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.
[24] Mikhail Belkin,et al. On the linearity of large non-linear models: when and why the tangent kernel is constant , 2020, NeurIPS.
[25] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[26] Weitao Du. Constructing exchangeable pairs by diffusion on manifolds and its application , 2020, 2006.09460.
[27] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[28] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[29] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[30] Jeffrey Pennington,et al. Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks , 2020, ICLR.
[31] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[32] Jaehoon Lee,et al. Neural Tangents: Fast and Easy Infinite Neural Networks in Python , 2019, ICLR.
[33] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[36] Jacek Tabor,et al. Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function , 2018, AISTATS.
[37] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[38] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[39] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[40] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.