Momentum Batch Normalization for Deep Learning with Small Batch Size

[1]  Lei Huang,et al.  Iterative Normalization: Beyond Standardization Towards Efficient Whitening , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ruimao Zhang,et al.  SSN: Learning Sparse Switchable Normalization via SparsestMax , 2019, International Journal of Computer Vision.

[3]  Ping Luo,et al.  Towards Understanding Regularization in Batch Normalization , 2018, ICLR.

[4]  Kaiming He,et al.  Group Normalization , 2018, International Journal of Computer Vision.

[5]  Yuan Xie,et al.  $L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Boris Flach,et al.  Stochastic Normalizations as Bayesian Learning , 2018, ACCV.

[7]  Carla P. Gomes,et al.  Understanding Batch Normalization , 2018, NeurIPS.

[8]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.

[9]  Qingyao Wu,et al.  Double Forward Propagation for Memorized Batch Normalization , 2018, AAAI.

[10]  Lei Huang,et al.  Decorrelated Batch Normalization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Minhyung Cho,et al.  Riemannian approach to batch normalization , 2017, NIPS.

[12]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Sergey Ioffe,et al.  Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[16]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[17]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[18]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[22]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[27]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[28]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[29]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[30]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[32]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[33]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[34]  Russell V. Lenth,et al.  Cumulative Distribution Function of the Noncentral T Distribution , 1989 .

[35]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .