Self-Training: A Survey

Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations. Because this framework is relevant in many applications, they have received a lot of interest in both academia and industry. Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years. These models are designed to find the decision boundary on low density regions without making additional assumptions about the data distribution, and use the unsigned output score of a learned classifier, or its margin, as an indicator of confidence. The working principle of self-training algorithms is to learn a classifier iteratively by assigning pseudo-labels to the set of unlabeled training samples with a margin greater than a certain threshold. The pseudo-labeled examples are then used to enrich the labeled training data and to train a new classifier in conjunction with the labeled training set. In this paper, we present self-training methods for binary and multi-class classification; as well as their variants and two related approaches, namely consistency-based approaches and transductive learning. We examine the impact of significant self-training features on various methods, using different general and image classification benchmarks, and we discuss our ideas for future research in self-training. To the best of our knowledge, this is the first thorough and complete survey on this subject.

[1]  R. Rout,et al.  FgbCNN: A unified bilinear architecture for learning a fine-grained feature representation in facial expression recognition , 2023, Image Vis. Comput..

[2]  Hongnan Li,et al.  Multiclass Anomaly Detection of Bridge Monitoring Data with Data Migration between Different Bridges for Balancing Data , 2023, Applied Sciences.

[3]  Arnout Van Messem,et al.  Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training , 2023, ArXiv.

[4]  Dan Jurafsky,et al.  Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation , 2023, ACL.

[5]  Venkatesh Babu Radhakrishnan,et al.  Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics , 2023, Neural Information Processing Systems.

[6]  Bin Luo,et al.  Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Long Yu,et al.  FaxMatch: Multi-Curriculum Pseudo-Labeling for Semi-supervised Medical Image Classification. , 2023, Medical physics.

[8]  Ka-chun Wong,et al.  Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA , 2023, Nature Communications.

[9]  Xiao Xiang Zhu,et al.  Semi-supervised bidirectional alignment for Remote Sensing cross-domain scene classification , 2023, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[10]  Matthias Sperber,et al.  Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data , 2022, ACL.

[11]  Xinggang Wang,et al.  BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Qingjie Liu,et al.  Learning from Future: A Novel Self-Training Framework for Semantic Segmentation , 2022, NeurIPS.

[13]  Xing Wang,et al.  STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction , 2022, COLING.

[14]  Ding Liang,et al.  DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection , 2022, NeurIPS.

[15]  Y. Rawat,et al.  Self-Supervised Learning for Videos: A Survey , 2022, ACM Comput. Surv..

[16]  Jianmin Wang,et al.  Debiased Self-Training for Semi-Supervised Learning , 2022, NeurIPS.

[17]  Li Dong,et al.  Corrupted Image Modeling for Self-Supervised Visual Pre-Training , 2022, ICLR.

[18]  M. Wang,et al.  How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis , 2022, ArXiv.

[19]  Massih-Reza Amini,et al.  Self-Training of Halfspaces with Generalization Guarantees under Massart Mislabeling Noise Model , 2021, ArXiv.

[20]  Yuhai Wu,et al.  Statistical Learning Theory , 2021, Technometrics.

[21]  T. Shinozaki,et al.  FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling , 2021, NeurIPS.

[22]  Zejun Ma,et al.  Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Massih-Reza Amini,et al.  Multi-class Probabilistic Bounds for Self-learning , 2021, ArXiv.

[24]  Quanquan Gu,et al.  Self-training Converts Weak Learners to Strong Learners in Mixture Models , 2021, AISTATS.

[25]  Yuhui Yuan,et al.  Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jimeng Sun,et al.  Machine learning applications for therapeutic tasks with genomics data , 2021, Patterns.

[27]  Weishi Zheng,et al.  MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alan W Black,et al.  Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data , 2021, CALCS.

[29]  Subhabrata Mukherjee,et al.  Self-Training with Weak Supervision , 2021, NAACL.

[30]  Gabriel Synnaeve,et al.  Self-Training and Pre-Training are Complementary for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Colin Wei,et al.  Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[32]  Yanwen Chong,et al.  Graph-based semi-supervised learning: A review , 2020, Neurocomputing.

[33]  J. Lee,et al.  Predicting What You Already Know Helps: Provable Self-Supervised Learning , 2020, NeurIPS.

[34]  Lennart Svensson,et al.  ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Abdel-rahman Mohamed,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[36]  Chris Donahue,et al.  Enabling Language Models to Fill in the Blanks , 2020, ACL.

[37]  Biqing Huang,et al.  Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language , 2020, ACL.

[38]  Tengyu Ma,et al.  Understanding Self-Training for Gradual Domain Adaptation , 2020, ICML.

[39]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[40]  Vicente Ordonez,et al.  Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning , 2020, AAAI Conference on Artificial Intelligence.

[41]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Awni Y. Hannun,et al.  Self-Training for End-to-End Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Massih-Reza Amini,et al.  Transductive Bounds for the Multi-Class Majority Vote Classifier , 2019, AAAI.

[45]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[46]  Timo Aila,et al.  Semi-supervised semantic segmentation needs strong, varied perturbations , 2019, BMVC.

[47]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[48]  Douglas A Lauffenburger,et al.  Computational translation of genomic responses from experimental model systems to humans , 2019, PLoS Comput. Biol..

[49]  B. V. Vijaya Kumar,et al.  Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-training , 2018, ECCV.

[50]  Nanning Zheng,et al.  Transductive Semi-Supervised Deep Learning Using Min-Max Features , 2018, ECCV.

[51]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[52]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[53]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[54]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Tatsuya Harada,et al.  Asymmetric Tri-training for Unsupervised Domain Adaptation , 2017, ICML.

[56]  Antti Tarvainen,et al.  Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, NIPS.

[57]  Bikash Joshi,et al.  Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification , 2017, NIPS.

[58]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[59]  Vasant G Honavar,et al.  PlasmoSEP: Predicting surface-exposed proteins on the malaria parasite using semisupervised self-training and expert-annotated data , 2016, Proteomics.

[60]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Zaïd Harchaoui,et al.  Rademacher Complexity Bounds for a Penalized Multiclass Semi-Supervised Algorithm , 2016, J. Artif. Intell. Res..

[62]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[63]  Mohd Firdaus Raih,et al.  Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data , 2016, BMC Bioinformatics.

[64]  Massih-Reza Amini,et al.  Learning with Partially Labeled and Interdependent Data , 2015, Springer International Publishing.

[65]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[66]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[67]  Yilong Yin,et al.  Semi-supervised Gait Recognition Based on Self-Training , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[68]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[70]  Rasoul Samad Zadeh Kaljahi,et al.  Adapting Self-Training for Semantic Role Labeling , 2010, ACL.

[71]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[72]  François Laviolette,et al.  A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning , 2008, NIPS.

[73]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[74]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[75]  Jean-Michel Renders,et al.  Semi-supervised Document Classification with a Mislabeling Error Model , 2008, ECIR.

[76]  RigolletPhilippe Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2007 .

[77]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[78]  Ran El-Yaniv,et al.  Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms , 2004, J. Artif. Intell. Res..

[79]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[80]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[81]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[82]  Christophe Ambroise,et al.  Semi-supervised MarginBoost , 2001, NIPS.

[83]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[84]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[85]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[86]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[87]  Henry Johnston Scudder,et al.  Adaptive communication receivers , 1965, IEEE Trans. Inf. Theory.

[88]  J. Luo,et al.  V-DixMatch: A Semi-Supervised Learning Method for Human Action Recognition in Night Video Sensing , 2023, IEEE Sensors Journal.

[89]  Chen Qiu Self-Supervised Anomaly Detection with Neural Transformations , 2023 .

[90]  Ji-Rong Wen,et al.  Semi-Supervised Learning , 2014 .

[91]  Partha Niyogi,et al.  Manifold regularization and semi-supervised learning: some theoretical analyses , 2013, J. Mach. Learn. Res..

[92]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[93]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[94]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[95]  Alexander Zien,et al.  An Augmented PAC Model for Semi-Supervised Learning , 2006 .

[96]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[97]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[98]  Stanley C. Fralick,et al.  Learning to recognize patterns without a teacher , 1967, IEEE Trans. Inf. Theory.