How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
暂无分享,去创建一个
[1] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[2] Masato Taki. Deep Residual Networks and Weight Initialization , 2017, ArXiv.
[3] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[4] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[5] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[6] Kihyuk Sohn,et al. Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units , 2017, AAAI.
[7] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[8] Ping Luo,et al. Towards Understanding Regularization in Batch Normalization , 2018, ICLR.
[9] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[10] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[13] Trevor Darrell,et al. Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.
[14] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[15] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[16] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[17] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[18] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[19] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[20] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Boris Ginsburg,et al. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification , 2017, ArXiv.
[22] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
[23] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[24] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[25] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[26] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[27] Boris Hanin,et al. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? , 2018, NeurIPS.
[28] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[29] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[30] Yoshua Bengio,et al. Residual Connections Encourage Iterative Inference , 2017, ICLR.
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[33] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[34] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[35] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[36] Jacek Tabor,et al. Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function , 2018, AISTATS.
[37] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[38] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[39] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.