An Iterative Algorithm for Extending Learners to a Semi-Supervised Setting

In this article, we present an iterative self-training algorithm whose objective is to extend learners from a supervised setting into a semi-supervised setting. The algorithm is based on using the predicted values for observations where the response is missing (unlabeled data) and then incorporating the predictions appropriately at subsequent stages. Convergence properties of the algorithm are investigated for particular learners, such as linear/logistic regression and linear smoothers with particular emphasis on kernel smoothers. Further, implementation issues of the algorithm with other learners such as generalized additive models, tree partitioning methods, partial least squares, etc. are also addressed. The connection between the proposed algorithm and graph-based semi-supervised learning methods is also discussed. The algorithm is illustrated on a number of real datasets using a varying degree of labeled responses.

[1]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[4]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[5]  R. Tibshirani,et al.  Linear Smoothers and Additive Models , 1989 .

[6]  R. Tibshirani,et al.  Rejoinder: Linear Smoothers and Additive Models , 1989 .

[7]  T. Cover,et al.  The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[8]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[9]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[10]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Shumeet Baluja,et al.  Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data , 1998, NIPS.

[14]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[15]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[16]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[19]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[20]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  Lawrence Carin,et al.  Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.

[23]  Steven P. Abney Understanding the Yarowsky Algorithm , 2004, CL.

[24]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[25]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[26]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[27]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[28]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[29]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.

[30]  George Michailidis,et al.  Graph-Based Semisupervised Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.