Semi-Supervised Sequence Labeling with Self-Learned Features

Typical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of labeled words. To tackle this issue, we propose a semi-supervised approach to improve the sequence labeling procedure in IE through a class of algorithms with {\em self-learned features} (SLF). A supervised classifier can be trained with annotated text sequences and used to classify each word in a large set of unannotated sentences. By averaging predicted labels over all cases in the unlabeled corpus, SLF training builds class label distribution patterns for each word (or word attribute) in the dictionary and re-trains the current model iteratively adding these distributions as extra word {\em features}. Basic SLF models how likely a word could be assigned to target class types. Several extensions are proposed, such as learning words' class boundary distributions. SLF exhibits robust and scalable behaviour and is easy to tune. We applied this approach on four classical IE tasks: named entity recognition (German and English), part-of-speech tagging (English) and one gene name recognition corpus. Experimental results show effective improvements over the supervised baselines on all tasks. In addition, when compared with the closely related self-training idea, this approach shows favorable advantages.

[1]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[2]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[3]  Rohit J. Kate,et al.  Semi-Supervised Learning for Semantic Parsing using Support Vector Machines , 2007, NAACL.

[4]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[5]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6]  D. Gildea,et al.  Self-training and Co-training for Semantic Role Labeling: Primary Report , 2006 .

[7]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[8]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[9]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[10]  Hal Daumé,et al.  Cross-Task Knowledge-Constrained Self Training , 2008, EMNLP.

[11]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[12]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[13]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[14]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[15]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[16]  Vincent Ng,et al.  Discriminative Models for Semi-Supervised Natural Language Learning , 2009, HLT-NAACL 2009.

[17]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[18]  Robert E. Schapire,et al.  Incorporating Prior Knowledge into Boosting , 2002, ICML.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Tong Zhang,et al.  Text Chunking using Regularized Winnow , 2001, ACL.

[21]  Xiaojin Zhu,et al.  Keepin’ It Real: Semi-Supervised Learning with Realistic Tuning , 2009 .

[22]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[23]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[24]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[25]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[26]  Yanjun Qi,et al.  Combining labeled and unlabeled data with word-class distribution learning , 2009, CIKM.

[27]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[28]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[29]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[30]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[31]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[32]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[33]  Zornitsa Kozareva,et al.  Self-training and Co-training Applied to Spanish Named Entity Recognition , 2005, MICAI.

[34]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[35]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.