论文信息 - MaxUp: A Simple Way to Improve Generalization of Neural Network Training

MaxUp: A Simple Way to Improve Generalization of Neural Network Training

We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.

[1] Francesco Orabona,et al. Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice , 2015, ArXiv.

[2] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[3] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[4] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[5] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[7] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[10] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12] Ioannis Mitliagkas,et al. Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer , 2018, ArXiv.

[13] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[16] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17] Wei Jiang,et al. Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[19] Steve Renals,et al. Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[20] Kannan Ramchandran,et al. Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[21] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[22] Di He,et al. FRAGE: Frequency-Agnostic Word Representation , 2018, NeurIPS.

[23] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[24] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26] Gregory Shakhnarovich,et al. FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[27] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[28] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[29] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Dilin Wang,et al. Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[31] Quoc V. Le,et al. DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[32] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[33] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[34] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[35] Tom Goldstein,et al. FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.

[36] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[37] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38] Greg Yang,et al. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[39] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[40] Quoc V. Le,et al. RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.