Norm matters: efficient and accurate normalization schemes in deep networks
暂无分享,去创建一个
Elad Hoffer | Ron Banner | Daniel Soudry | Itay Golan | Daniel Soudry | Elad Hoffer | Ron Banner | Itay Golan
[1] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[2] Ron Meir,et al. Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.
[3] Boris Ginsburg,et al. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification , 2017, ArXiv.
[4] Kaiming He,et al. Group Normalization , 2018, ECCV.
[5] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[7] Hao Li,et al. On the effect of Batch Normalization and Weight Normalization in Generative Adversarial Networks , 2017, ArXiv.
[8] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[9] Eriko Nurvitadhi,et al. Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[11] Arild Nøkland,et al. Shifting Mean Activation Towards Zero with Bipolar Activation Functions , 2017, ICLR.
[12] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[13] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[14] M. Simon. Probability distributions involving Gaussian random variables : a handbook for engineers and scientists , 2002 .
[15] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[16] Sitao Xiang,et al. On the Effects of Batch and Weight Normalization in Generative Adversarial Networks , 2017 .
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[19] Lorenzo Porzi,et al. In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[22] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[23] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.
[24] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[25] S. Bos,et al. Using weight decay to optimize the generalization ability of a perceptron , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).
[26] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[27] Yuan Xie,et al. $L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[28] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[29] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[30] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[33] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[34] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[35] Pradeep Dubey,et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.
[36] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[37] Siegfried Bös. Optimal Weight Decay in a Perceptron , 1996, ICANN.
[38] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[39] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[40] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[41] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[42] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[43] Lei Huang,et al. Projection Based Weight Normalization for Deep Neural Networks , 2017, ArXiv.