论文信息 - Iterative Labeling for Semi-Supervised Learning

Iterative Labeling for Semi-Supervised Learning

We propose a unified perspective of a large family of semi-supervised learning algorithms, which select and label unlabeled data in an iterative process. We discuss existing approaches that label examples based on the confidence of the current hypothesis, and propose an alternative approach that labels examples based on empirical risk. This new approach is shown to be statistically reasonable, allows for worst-case performance guarantees and, as we show, significantly outperforms confidence-based approaches in experiments.

Dan Roth | Steve Hanneke

[1] Claire Cardie,et al. Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[2] Yoram Singer,et al. Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[3] Ayhan Demiriz,et al. Exploiting unlabeled data in ensemble methods , 2002, KDD.

[4] Dale Schuurmans. A New Metric-Based Approach to Model Selection , 1997, AAAI/IAAI.

[5] Zoubin Ghahramani,et al. Learning from labeled and unlabeled data with label propagation , 2002 .

[6] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[7] Andrew McCallum,et al. Semi-Supervised Clustering with User Feedback , 2003 .

[8] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[9] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[10] Yan Zhou,et al. Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[11] James R. Curran,et al. Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.