Improved Training for Self Training by Confidence Assessments

It is well known that for some tasks, labeled data sets may be hard to gather. Self-training, or pseudo-labeling, tackles the problem of having insufficient training data. In the self-training scheme, the classifier is first trained on a limited, labeled dataset, and after that it is trained on an additional, unlabeled dataset, using its own predictions as labels, provided those predictions are made with high enough confidence. Using credible interval based on MC-dropout as a confidence measure, the proposed method is able to gain substantially better results comparing to several other pseudo-labeling methods and out-performs the former state-of-the-art pseudo-labeling technique by 7\(\%\) on the MNIST dataset. In addition to learning from large and static unlabeled datasets, the suggested approach may be more suitable than others as an online learning method where the classifier keeps getting new unlabeled data. The approach may be also applicable in the recent method of pseudo-gradients for training long sequential neural networks.

[1]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[2]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[3]  Harri Valpola,et al.  From neural PCA to deep unsupervised learning , 2014, ArXiv.

[4]  Lourdes Agapito,et al.  Semi-supervised Learning Using an Unsupervised Atlas , 2014, ECML/PKDD.

[5]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[6]  Peter Kontschieder,et al.  Loss Max-Pooling for Semantic Image Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[8]  Bharath Hariharan,et al.  Low-shot visual object recognition , 2016, ArXiv.

[9]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[10]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[11]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[12]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[13]  Yg,et al.  Dropout as a Bayesian Approximation : Insights and Applications , 2015 .

[14]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[15]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.