Combining labeled and unlabeled data with word-class distribution learning

We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it task of information extraction (IE) by utilizing unlabeled sentences to improve supervised classification methods. WCDL iteratively builds class label distributions for each word in the dictionary by averaging predicted labels over all cases in the unlabeled corpus, and re-training a base classifier adding these distributions as word features. In contrast, traditional self-training or co-training methods self-labeled examples (rather than features) which can degrade performance due to incestuous learning bias. WCDL exhibits robust behavior, and has no difficult parameters to tune. We applied our method on German and English name entity recognition (NER) tasks. WCDL shows improvements over self-training, multi-task semi-supervision or supervision alone, in particular yielding a state-of-the art 75.72 F1 score on the German NER task.

[1]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[4]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[5]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6]  Tong Zhang,et al.  Text Chunking using Regularized Winnow , 2001, ACL.

[7]  Robert E. Schapire,et al.  Incorporating Prior Knowledge into Boosting , 2002, ICML.

[8]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[9]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[10]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[11]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[12]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[13]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[14]  Zornitsa Kozareva,et al.  Self-training and Co-training Applied to Spanish Named Entity Recognition , 2005, MICAI.

[15]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[16]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[17]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[18]  D. Gildea,et al.  Self-training and Co-training for Semantic Role Labeling: Primary Report , 2006 .

[19]  Rohit J. Kate,et al.  Semi-Supervised Learning for Semantic Parsing using Support Vector Machines , 2007, NAACL.

[20]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[21]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[22]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[23]  Hal Daumé,et al.  Cross-Task Knowledge-Constrained Self Training , 2008, EMNLP.