A New Data Selection Principle for Semi-Supervised Incremental Learning

Current semi-supervised incremental learning approaches select unlabeled examples with predicted high confidence for model re-training. We show that for many applications this data selection strategy is not correct. This is because the confidence score is primarily a metric to measure the classification correctness on a particular example, rather than one to measure the example's contribution to the training of an improved model, especially in the case that the information used in the confidence annotator is correlated with that generated by the classifier. To address this problem, we propose a performance-driven principle for unlabeled data selection in which only the unlabeled examples that help to improve classification accuracy are selected for semi-supervised learning. Encouraging results are presented for a variety of public benchmark datasets

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[3]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models and Bayesian Networks , 2003 .

[4]  Jesús Cid-Sueiro,et al.  An entropy minimization principle for semi-supervised terrain classification , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[5]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[6]  Alexander I. Rudnicky,et al.  Investigations on ensemble based semi-supervised acoustic model training , 2005, INTERSPEECH.

[7]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[8]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[9]  M. Seeger Learning with labeled and unlabeled dataMatthias , 2001 .

[10]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[11]  Shivani Agarwal,et al.  An Experimental Study of EM-Based Algorithms for Semi-Supervised Learning in Audio Classification , 2003 .

[12]  Ruslan Salakhutdinov,et al.  Semi-supervised mixture-of-experts classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).