Iterative Labeling for Semi-Supervised Learning

We propose a unified perspective of a large family of semi-supervised learning algorithms, which select and label unlabeled data in an iterative process. We discuss existing approaches that label examples based on the confidence of the current hypothesis, and propose an alternative approach that labels examples based on empirical risk. This new approach is shown to be statistically reasonable, allows for worst-case performance guarantees and, as we show, significantly outperforms confidence-based approaches in experiments.

[1]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[2]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[3]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[4]  Dale Schuurmans A New Metric-Based Approach to Model Selection , 1997, AAAI/IAAI.

[5]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[6]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[7]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[8]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[9]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[10]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[11]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[12]  Thomas S. Huang,et al.  Semisupervised Learning of Classifiers With Application to Human -Computer Interaction , 2003 .

[13]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Steven P. Abney Understanding the Yarowsky Algorithm , 2004, CL.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[18]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[19]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[20]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[21]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..