Adversarially Adaptive Normalization for Single Domain Generalization

Single domain generalization aims to learn a model that performs well on many unseen domains with only one domain data for training. Existing works focus on studying the adversarial domain augmentation (ADA) to improve the model’s generalization capability. The impact on domain generalization of the statistics of normalization layers is still underinvestigated. In this paper, we propose a generic normalization approach, adaptive standardization and rescaling normalization (ASR-Norm), to complement the missing part in previous works. ASR-Norm learns both the standardization and rescaling statistics via neural networks. This new form of normalization can be viewed as a generic form of the traditional normalizations. When trained with ADA, the statistics in ASR-Norm are learned to be adaptive to the data coming from different domains, and hence improves the model generalization performance across domains, especially on the target domain with large discrepancy from the source domain. The experimental results show that ASR-Norm can bring consistent improvement to the state-of-the-art ADA approaches by 1.6%, 2.7%, and 6.3% averagely on the Digits, CIFAR-10-C, and PACS benchmarks, respectively. As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods.

[1]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[2]  Bohyung Han,et al.  Learning to Optimize Domain Specific Normalization for Domain Generalization , 2019, ECCV.

[3]  Il-Chul Moon,et al.  Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.

[4]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[5]  Dimitris N. Metaxas,et al.  Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness , 2020, NeurIPS.

[6]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[7]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[10]  Gurumurthy Swaminathan,et al.  d-SNE: Domain Adaptation Using Stochastic Neighborhood Embedding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[13]  Vittorio Murino,et al.  Model Vulnerability to Distributional Shifts over Image Transformation Sets , 2019, CVPR Workshops.

[14]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[16]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Hwann-Tzong Chen,et al.  Instance-Level Meta Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Tao Xiang,et al.  Learning to Generate Novel Domains for Domain Generalization , 2020, ECCV.

[21]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[22]  Ping Luo,et al.  Differentiable Learning-to-Normalize via Switchable Normalization , 2018, ICLR.

[23]  Eric P. Xing,et al.  Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[24]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[25]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[26]  Xi Peng,et al.  Learning to Learn Single Domain Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[28]  Balaji Lakshminarayanan,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[29]  Thomas L. Botch,et al.  A deeper look at vision and memory , 2021, Nature Neuroscience.

[30]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Hyo-Eun Kim,et al.  Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks , 2018, NeurIPS.

[33]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[34]  Fabio Maria Carlucci,et al.  Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Alexander D'Amour,et al.  Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift , 2020, ArXiv.

[36]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[37]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[38]  Xiaoning Qian,et al.  Contextual Dropout: An Efficient Sample-Dependent Dropout Module , 2021, ICLR.

[39]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[41]  Zhiyuan Zhang,et al.  Understanding and Improving Layer Normalization , 2019, NeurIPS.

[42]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[43]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[44]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[45]  Ruimao Zhang,et al.  SSN: Learning Sparse Switchable Normalization via SparsestMax , 2019, International Journal of Computer Vision.

[46]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[47]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[48]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[49]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[50]  Kilian Q. Weinberger,et al.  On Feature Normalization and Data Augmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[53]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[54]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[55]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[56]  Michael I. Jordan,et al.  Transferable Normalization: Towards Improving Transferability of Deep Neural Networks , 2019, NeurIPS.

[57]  Silvio Savarese,et al.  Generalizing to Unseen Domains via Adversarial Data Augmentation , 2018, NeurIPS.

[58]  Judy Hoffman,et al.  Learning to Balance Specificity and Invariance for In and Out of Domain Generalization , 2020, ECCV.

[59]  Alexei A. Efros,et al.  Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , 2019, ICML.

[60]  Bo Chen,et al.  Bayesian Attention Modules , 2020, NeurIPS.

[61]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[63]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.