The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks
暂无分享,去创建一个
[1] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[2] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[3] Frederik Kunstner,et al. Limitations of the Empirical Fisher Approximation , 2019, NeurIPS.
[4] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[5] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[6] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[7] Shun-ichi Amari,et al. Information Geometry and Its Applications , 2016 .
[8] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[9] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[10] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[11] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[12] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[13] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[14] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[15] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[16] Carla P. Gomes,et al. Understanding Batch Normalization , 2018, NeurIPS.
[17] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[18] Thomas Hofmann,et al. Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.
[19] David J. Schwab,et al. Mean-field Analysis of Batch Normalization , 2019, ArXiv.
[20] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[21] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[22] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[23] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[24] S. Amari. Natural Gradient Learning and Its Dynamics in Singular Regions , 2016 .
[25] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[26] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[27] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[28] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[29] Shun-ichi Amari,et al. Universal statistics of Fisher information in deep neural networks: mean field approach , 2018, AISTATS.
[30] Quoc V. Le,et al. The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study , 2019, ICML.
[31] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[32] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[33] Shun-ichi Amari,et al. A method of statistical neurodynamics , 1974, Kybernetik.
[34] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[35] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.