论文信息 - Momentum Batch Normalization for Deep Learning with Small Batch Size - 字舞流文

Momentum Batch Normalization for Deep Learning with Small Batch Size

Lei Zhang | Deyu Meng | Xian-Sheng Hua | Hongwei Yong | Jianqiang Huang | Xiansheng Hua | Deyu Meng | Hongwei Yong | Lei Zhang | Jianqiang Huang

[1] Lei Huang,et al. Iterative Normalization: Beyond Standardization Towards Efficient Whitening , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Ruimao Zhang,et al. SSN: Learning Sparse Switchable Normalization via SparsestMax , 2019, International Journal of Computer Vision.

[3] Ping Luo,et al. Towards Understanding Regularization in Batch Normalization , 2018, ICLR.

[4] Kaiming He,et al. Group Normalization , 2018, International Journal of Computer Vision.

[5] Yuan Xie,et al. $L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6] Boris Flach,et al. Stochastic Normalizations as Bayesian Learning , 2018, ACCV.

[7] Carla P. Gomes,et al. Understanding Batch Normalization , 2018, NeurIPS.

[8] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.

[9] Qingyao Wu,et al. Double Forward Propagation for Memorized Batch Normalization , 2018, AAAI.

[10] Lei Huang,et al. Decorrelated Batch Normalization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Minhyung Cho,et al. Riemannian approach to batch normalization , 2017, NIPS.

[12] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[14] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Lei Zhang,et al. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[16] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[17] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[18] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[19] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[22] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[25] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[27] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[28] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[29] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[30] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Guozhong An,et al. The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[32] Christopher M. Bishop,et al. Current address: Microsoft Research, , 2022 .

[33] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .

[34] Russell V. Lenth,et al. Cumulative Distribution Function of the Noncentral T Distribution , 1989 .

[35] L. Crocker,et al. Introduction to Classical and Modern Test Theory , 1986 .