Robust Semi-Supervised Learning with Out of Distribution Data

Semi-supervised learning (SSL) based on deep neural networks (DNNs) has recently been proven effective. However, recent work [Oliver et al., 2018] shows that the performance of SSL could degrade substantially when the unlabeled set has out-of-distribution examples (OODs). In this work, we first study the key causes about the negative impact of OOD on SSL. We found that (1) OODs close to the decision boundary have a larger effect on the performance of existing SSL algorithms than the OODs far away from the decision boundary and (2) Batch Normalization (BN), a popular module in deep networks, could degrade the performance of a DNN for SSL substantially when the unlabeled set contains OODs. To address these causes, we proposed a novel unified robust SSL approach for many existing SSL algorithms in order to improve their robustness against OODs. In particular, we proposed a simple modification to batch normalization, called weighted batch normalization, capable of improving the robustness of BN against OODs. We developed two efficient hyperparameter optimization algorithms that have different tradeoffs in computational efficiency and accuracy. The first is meta-approximation and the second is implicit-differentiation based approximation. Both algorithms learn to reweight the unlabeled samples in order to improve the robustness of SSL against OODs. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposed approach significantly improves the robustness of four representative SSL algorithms against OODs, in comparison with four state-of-the-art robust SSL approaches. We performed an ablation study to demonstrate which components of our approach are most important for its success.

[1]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[2]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[3]  Zhi-Hua Zhou,et al.  Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data , 2020, ICML.

[4]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[5]  Noel E. O'Connor,et al.  Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Hongyu Guo,et al.  MixUp as Locally Linear Out-Of-Manifold Regularization , 2018, AAAI.

[12]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[13]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[14]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[15]  Kaizhu Huang,et al.  Maximum margin semi-supervised learning with irrelevant data , 2015, Neural Networks.

[16]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[17]  Lina Yao,et al.  Distributionally Robust Semi-Supervised Learning for People-Centric Sensing , 2018, AAAI.

[18]  Matthias Hein,et al.  Towards neural networks that provably know when they don't know , 2020, ICLR.

[19]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[20]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[21]  Ivor W. Tsang,et al.  Robust Semi-Supervised Learning through Label Aggregation , 2016, AAAI.

[22]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[23]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[24]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[26]  Shaogang Gong,et al.  Semi-Supervised Learning under Class Distribution Mismatch , 2020, AAAI.

[27]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[28]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[29]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[30]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[31]  Michael R. Lyu,et al.  Can irrelevant data help semi-supervised learning, why and how? , 2011, CIKM '11.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[34]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[35]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.