Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer

Deep networks often perform well on the data manifold on which they are trained, yet give incorrect (and often very confident) answers when evaluated on points from off of the training distribution. This is exemplified by the adversarial examples phenomenon but can also be seen in terms of model generalization and domain shift. We propose Manifold Mixup which encourages the network to produce more reasonable and less confident predictions at points with combinations of attributes not seen in the training set. This is accomplished by training on convex combinations of the hidden state representations of data samples. Using this method, we demonstrate improved semi-supervised learning, learning with limited labeled data, and robustness to adversarial examples. Manifold Mixup requires no (significant) additional computation. Analytical experiments on both real data and synthetic data directly support our hypothesis for why the Manifold Mixup method improves results.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[10]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[11]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[12]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[13]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[14]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[15]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[16]  Rafal Bogacz,et al.  An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity , 2017, Neural Computation.

[17]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[18]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[19]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[22]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[23]  Kyunghyun Cho,et al.  Retrieval-Augmented Convolutional Neural Networks for Improved Robustness against Adversarial Examples , 2018, ArXiv.

[24]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[25]  Colin Raffel,et al.  Realistic Evaluation of Semi-Supervised Learning Algorithms , 2018, ICLR.

[26]  Tatsuya Harada,et al.  Between-Class Learning for Image Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[28]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.