Adversarial Feature Distribution Alignment for Semi-Supervised Learning

Training deep neural networks with only a few labeled samples can lead to overfitting. This is problematic in semi-supervised learning where only a few labeled samples are available. In this paper, we show that a consequence of overfitting in SSL is feature distribution misalignment between labeled and unlabeled samples. Hence, we propose a new feature distribution alignment method. Our method is particularly effective when using only a small amount of labeled samples. We test our method on CIFAR10 and SVHN. On SVHN we achieve a test error of 3.88% (250 labeled samples) and 3.39% (1000 labeled samples) which is close to the fully supervised model 2.89% (73k labeled samples). In comparison, the current SOTA achieves only 4.29% and 3.74%. Finally, we provide a theoretical insight why feature distribution alignment occurs and show that our method reduces it.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[3]  Bo Zhang,et al.  Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[8]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[10]  Chong-Wah Ngo,et al.  Transferrable Prototypical Networks for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[12]  Radu Timofte,et al.  Adversarial Sampling for Active Learning , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[15]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[16]  Stefano Soatto,et al.  SaaS: Speed as a Supervisor for Semi-supervised Learning , 2018, ECCV.

[17]  Xavier Gastaldi,et al.  Shake-Shake regularization , 2017, ArXiv.

[18]  Junzhou Huang,et al.  Progressive Feature Alignment for Unsupervised Domain Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[22]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[23]  Matthieu Cord,et al.  HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning , 2018, ECCV.

[24]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[25]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[26]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[27]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[28]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[29]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[30]  Geoffrey French,et al.  Self-ensembling for visual domain adaptation , 2017, ICLR.

[31]  Fan Yang,et al.  Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[32]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[33]  Jun Zhu,et al.  Triple Generative Adversarial Nets , 2017, NIPS.

[34]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[35]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[36]  Radu Timofte,et al.  Towards Closing the Gap in Weakly Supervised Semantic Segmentation with DCNNs: Combining Local and Global Models , 2018, ArXiv.

[37]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[38]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[39]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[40]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[41]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[42]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[43]  Pedro H. O. Pinheiro,et al.  Unsupervised Domain Adaptation with Similarity Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Zhanxing Zhu,et al.  Tangent-Normal Adversarial Regularization for Semi-Supervised Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..