[1] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.
[2] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[3] Hao Li,et al. On the effect of Batch Normalization and Weight Normalization in Generative Adversarial Networks , 2017, ArXiv.
[4] David Rolnick,et al. Complexity of Linear Regions in Deep Networks , 2019, ICML.
[5] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[6] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[7] Kevin Smith,et al. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.
[8] Boris Flach,et al. Stochastic Normalizations as Bayesian Learning , 2018, ACCV.
[9] Carla P. Gomes,et al. Understanding Batch Normalization , 2018, NeurIPS.
[10] Graham W. Taylor,et al. Batch Normalization is a Cause of Adversarial Vulnerability , 2019, ArXiv.
[11] Michael James,et al. Online Normalization for Training Neural Networks , 2019, NeurIPS.
[12] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.
[13] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[14] Jonathon Shlens,et al. Accelerating Training of Deep Neural Networks with a Standardization Loss , 2019, ArXiv.
[15] Kihyuk Sohn,et al. Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units , 2017, AAAI.
[16] Alan L. Yuille,et al. Intriguing Properties of Adversarial Training at Scale , 2020, ICLR.
[17] Andrea Vedaldi,et al. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.
[18] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[19] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[20] Pascal Vincent,et al. Recurrent Normalization Propagation , 2017, ICLR.
[21] Kaiming He,et al. Group Normalization , 2018, ECCV.
[22] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[23] Shankar Krishnan,et al. Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Justin Johnson,et al. Rethinking "Batch" in BatchNorm , 2021, ArXiv.
[25] Dawn Xiaodong Song,et al. Gradients explode - Deep Networks are shallow - ResNet explained , 2017, ICLR.
[26] Joscha Bach,et al. Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization , 2019, ArXiv.
[27] Bhiksha Raj,et al. Is normalization indispensable for training deep neural network? , 2020, NeurIPS.
[28] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[29] Ruimao Zhang,et al. Differentiable Dynamic Normalization for Learning Deep Representation , 2019, ICML.
[30] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[31] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[32] Andrea Vedaldi,et al. Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Francis Bach,et al. Batch normalization provably avoids ranks collapse for randomly initialised deep networks , 2020, NeurIPS.
[34] Quoc V. Le,et al. Evolving Normalization-Activation Layers , 2020, NeurIPS.
[35] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[37] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Boris Ginsburg,et al. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification , 2017, ArXiv.
[39] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Michael I. Jordan,et al. Transferable Normalization: Towards Improving Transferability of Deep Neural Networks , 2019, NeurIPS.
[43] Guodong Zhang,et al. Three Mechanisms of Weight Decay Regularization , 2018, ICLR.
[44] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[45] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[46] Zach Eaton-Rosen,et al. Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training , 2021, ArXiv.
[47] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[48] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[49] Lei Huang,et al. Group Whitening: Balancing Learning Efficiency and Representational Capacity , 2020, ArXiv.
[50] Jascha Sohl-Dickstein,et al. Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence , 2020, ArXiv.
[51] Arthur Jacot,et al. Freeze and Chaos for DNNs: an NTK view of Batch Normalization, Checkerboard and Boundary Effects , 2019, ArXiv.
[52] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[54] Takeru Miyato,et al. cGANs with Projection Discriminator , 2018, ICLR.
[55] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[56] Robert P. Dick,et al. Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning , 2021, ArXiv.
[57] Jian Sun,et al. Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization , 2020, ICLR.
[58] Hakan Bilen,et al. Mode Normalization , 2018, ICLR.
[59] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[60] Michael J. Dinneen,et al. Four Things Everyone Should Know to Improve Batch Normalization , 2019, ICLR.
[61] Samuel L. Smith,et al. Characterizing signal propagation to close the performance gap in unnormalized ResNets , 2021, ICLR.
[62] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[63] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[64] Lei Huang,et al. Centered Weight Normalization in Accelerating Training of Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[65] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[66] Zhanxing Zhu,et al. Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay , 2020, ArXiv.
[67] K. Simonyan,et al. High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.
[68] Rico Sennrich,et al. Root Mean Square Layer Normalization , 2019, NeurIPS.
[69] Ping Luo,et al. Differentiable Learning-to-Normalize via Switchable Normalization , 2018, ICLR.
[70] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[71] Boris Flach,et al. Normalization of Neural Networks using Analytic Variance Propagation , 2018, ArXiv.
[72] Ping Luo,et al. Towards Understanding Regularization in Batch Normalization , 2018, ICLR.
[73] Samuel L. Smith,et al. Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks , 2020, NeurIPS.
[74] Hugo Larochelle,et al. Modulating early visual processing by language , 2017, NIPS.
[75] Thomas Hofmann,et al. Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.
[76] Renjie Liao,et al. Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes , 2016, ICLR.
[77] David Rolnick,et al. Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.
[78] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[79] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[80] Yann Dauphin,et al. Deconstructing the Regularization of BatchNorm , 2021, ICLR.
[81] Antoine Labatie,et al. Characterizing Well-Behaved vs. Pathological Deep Neural Networks , 2018, ICML.
[82] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[83] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[84] Minhyung Cho,et al. Riemannian approach to batch normalization , 2017, NIPS.
[85] Elad Hoffer,et al. Norm matters: efficient and accurate normalization schemes in deep networks , 2018, NeurIPS.
[86] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[87] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.