MaxUp: A Simple Way to Improve Generalization of Neural Network Training

We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.

[1]  Francesco Orabona,et al.  Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice , 2015, ArXiv.

[2]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[3]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[4]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[5]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[7]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[10]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Ioannis Mitliagkas,et al.  Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer , 2018, ArXiv.

[13]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[16]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[19]  Steve Renals,et al.  Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[20]  Kannan Ramchandran,et al.  Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[21]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[22]  Di He,et al.  FRAGE: Frequency-Agnostic Word Representation , 2018, NeurIPS.

[23]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[24]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[27]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[28]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[29]  Quoc V. Le,et al.  Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Dilin Wang,et al.  Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[31]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[32]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[33]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[34]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[35]  Tom Goldstein,et al.  FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.

[36]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[37]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[39]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[40]  Quoc V. Le,et al.  RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.