Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only “virtually” adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  V. Arnold Mathematical Methods of Classical Mechanics , 1974 .

[3]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[4]  G. Wahba Spline models for observational data , 1990 .

[5]  S. Oh,et al.  Regularization using jittered training data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[6]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[7]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[8]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[9]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[10]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  渡邊 澄夫 Algebraic geometry and statistical learning theory , 2009 .

[15]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[17]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[18]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[19]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[20]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[21]  Shin-ichi Maeda,et al.  A Bayesian encourages dropout , 2014, ArXiv.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[24]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[25]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[27]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[30]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[31]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[35]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[36]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[37]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[38]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[39]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[40]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[41]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[42]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[43]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[44]  Tapani Raiko,et al.  Understanding Regularization by Virtual Adversarial Training, Ladder Networks and Others , 2016, ICLR 2016.

[45]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[46]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[47]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  R. Cardell-Oliver,et al.  Dataset , 2019, Proceedings of the 2nd Workshop on Data Acquisition To Analysis - DATA'19.