Manifold Adversarial Learning

The recently proposed adversarial training methods show the robustness to both adversarial and original examples and achieve state-of-the-art results in supervised and semi-supervised learning. All the existing adversarial training methods con- sider only how the worst perturbed examples (i.e., adversarial examples) could affect the model output. Despite their success, we argue that such setting may be in lack of generalization, since the output space (or label space) is apparently less informative. In this paper, we propose a novel method, called Manifold Adver- sarial Training (MAT). MAT manages to build an adversarial framework based on how the worst perturbation could affect the distributional manifold rather than the output space. Particularly, a latent data space with the Gaussian Mixture Model (GMM) will be first derived. On one hand, MAT tries to perturb the input samples in the way that would rough the distributional manifold the worst. On the other hand, the deep learning model is trained trying to promote in the latent space the manifold smoothness, measured by the variation of Gaussian mixtures (given the local perturbation around the data point). Importantly, since the latent space is more informative than the output space, the proposed MAT can learn better a ro- bust and compact data representation, leading to further performance improvemen- t. The proposed MAT is important in that it can be considered as a superset of one recently-proposed discriminative feature learning approach called center loss. We conducted a series of experiments in both supervised and semi-supervised learn- ing on three benchmark data sets, showing that the proposed MAT can achieve remarkable performance, much better than those of the state-of-the-art adversarial approaches.

[1]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[2]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[4]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[5]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[7]  Kaizhu Huang,et al.  A Unified Gradient Regularization Family for Adversarial Examples , 2015, 2015 IEEE International Conference on Data Mining.

[8]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[9]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[10]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[11]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[12]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[16]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[17]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[19]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[20]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[23]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[24]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[25]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[26]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[27]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[28]  Paul A. Viola,et al.  Multi-modal volume registration by maximization of mutual information , 1996, Medical Image Anal..

[29]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[30]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[31]  Jiansheng Chen,et al.  Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jun Zhu,et al.  Triple Generative Adversarial Nets , 2017, NIPS.