DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

Generalization of neural networks is crucial for deploying them safely in the real world. Common training strategies to improve generalization involve the use of data augmentations, ensembling and model averaging. In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch, and show that this can learn a more balanced distribution of features. Further, we propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin, and further Aggregates their weights to combine their expertise and obtain improved generalization. We find that Repeating the step of Aggregation throughout training improves the overall optimization trajectory and also ensures that the individual models have a sufficiently low loss barrier to obtain improved generalization on combining them. We shed light on our approach by casting it in the framework proposed by Shen et al. and theoretically show that it indeed generalizes better. In addition to improvements in In- Domain generalization, we demonstrate SOTA performance on the Domain Generalization benchmarks in the popular DomainBed framework as well. Our method is generic and can easily be integrated with several base training algorithms to achieve performance gains.

[1]  R. Venkatesh Babu,et al.  Efficient and Effective Augmentation Strategy for Adversarial Training , 2022, NeurIPS.

[2]  Sungrae Park,et al.  Domain Generalization by Mutual-Information Regularization with Pre-trained Models , 2022, European Conference on Computer Vision.

[3]  Ari S. Morcos,et al.  Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.

[4]  Anh Tuan Tran,et al.  Exploiting Domain-Specific Features to Enhance Domain Generalization , 2021, NeurIPS.

[5]  Hanie Sedghi,et al.  The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks , 2021, ICLR.

[6]  Michael W. Mahoney,et al.  Noisy Feature Mixup , 2021, ICLR.

[7]  Donggeun Yoo,et al.  Reducing Domain Gap by Reducing Style Bias , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  B. Schiele,et al.  Relating Adversarially Robust Generalization to Flat Minima , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Y. Qiao,et al.  Domain Generalization with MixStyle , 2021, ICLR.

[10]  Matthieu Cord,et al.  MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[12]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[13]  Ali Farhadi,et al.  Learning Neural Network Subspaces , 2021, ICML.

[14]  Sungrae Park,et al.  SWAD: Domain Generalization by Seeking Flat Minima , 2021, NeurIPS.

[15]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[16]  Ariel Kleiner,et al.  Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ICLR.

[17]  N. Joseph Tatro,et al.  Optimizing Mode Connectivity via Neuron Alignment , 2020, NeurIPS.

[18]  Judy Hoffman,et al.  Learning to Balance Specificity and Invariance for In and Out of Domain Generalization , 2020, ECCV.

[19]  Marc Niethammer,et al.  Robust and Generalizable Visual Representation Learning via Random Convolutions , 2020, ICLR.

[20]  Elisa Ricci,et al.  Towards Recognizing Unseen Categories in Unseen Domains , 2020, ECCV.

[21]  Timothy M. Hospedales,et al.  Learning to Generate Novel Domains for Domain Generalization , 2020, ECCV.

[22]  S. Levine,et al.  Adaptive Risk Minimization: Learning to Adapt to Domain Shift , 2020, NeurIPS.

[23]  Ching-Yao Chuang,et al.  Estimating Generalization under Distribution Shifts via Domain-Invariant Representations , 2020, ICML.

[24]  Eric P. Xing,et al.  Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[25]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[26]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[27]  Prateek Jain,et al.  The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.

[28]  Torsten Hoefler,et al.  Augment Your Batch: Improving Generalization Through Instance Repetition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yufei Wang,et al.  Heterogeneous Domain Generalization Via Domain Mixup , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Karthikeyan Natesan Ramamurthy,et al.  Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness , 2020, ICLR.

[31]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks , 2020, ICLR.

[32]  Xilin Chen,et al.  Cross-Domain Face Presentation Attack Detection via Multi-Domain Disentangled Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xi Peng,et al.  Learning to Learn Single Domain Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sunita Sarawagi,et al.  Efficient Domain Generalization via Common-Specific Low-Rank Decomposition , 2020, ICML.

[35]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[36]  Anil K. Jain,et al.  Towards Universal Representation Learning for Deep Face Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  C. Sminchisescu,et al.  Relative Flatness and Generalization , 2020, NeurIPS.

[38]  Daniel M. Roy,et al.  Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.

[39]  Hossein Mobahi,et al.  Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[40]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[41]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  K. Keutzer,et al.  Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Bohyung Han,et al.  Learning to Optimize Domain Specific Normalization for Domain Generalization , 2019, ECCV.

[44]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[45]  Zhitang Chen,et al.  Domain Generalization via Multidomain Discriminant Analysis , 2019, UAI.

[46]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jakub M. Tomczak,et al.  DIVA: Domain Invariant Variational Autoencoders , 2019, DGS@ICLR.

[48]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Vittorio Murino,et al.  Model Vulnerability to Distributional Shifts over Image Transformation Sets , 2019, CVPR Workshops.

[50]  Quynh Nguyen,et al.  On Connected Sublevel Sets in Deep Learning , 2019, ICML.

[51]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[53]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[54]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[55]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[56]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[57]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[58]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[60]  Silvio Savarese,et al.  Generalizing to Unseen Domains via Adversarial Data Augmentation , 2018, NeurIPS.

[61]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representations , 2018, AAAI.

[62]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[63]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[64]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[65]  Siddhartha Chaudhuri,et al.  Generalizing Across Domains via Cross-Gradient Training , 2018, ICLR.

[66]  Gilles Blanchard,et al.  Domain Generalization by Marginal Transfer Learning , 2017, J. Mach. Learn. Res..

[67]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[68]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[69]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[70]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[71]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[72]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[74]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[75]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[76]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[77]  David A. Forsyth,et al.  Swapout: Learning an ensemble of deep architectures , 2016, NIPS.

[78]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[80]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[81]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[82]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[83]  Ye Xu,et al.  Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[84]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[85]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[86]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[87]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[88]  Sébastien Bubeck,et al.  Data Augmentation as Feature Manipulation: a story of desert cows and grass cows , 2022, ArXiv.

[89]  Junchi Yan,et al.  The Diversified Ensemble Neural Network , 2020, NeurIPS.

[90]  Yun Fu,et al.  Deep Domain Generalization With Structured Low-Rank Constraint , 2018, IEEE Transactions on Image Processing.

[91]  MarchandMario,et al.  Domain-adversarial training of neural networks , 2016 .

[92]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .