Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning

We consider a multilingual weakly supervised learning scenario where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide the learning in other languages. Past approaches project labels across bitext and use them as features or gold labels for training. We propose a new method that projects model expectations rather than labels, which facilities transfer of model uncertainty across language boundaries. We encode expectations as constraints and train a discriminative CRF model using Generalized Expectation Criteria (Mann and McCallum, 2010). Evaluated on standard Chinese-English and German-English NER datasets, our method demonstrates F1 scores of 64% and 60% when no labeled data is used. Attaining the same accuracy with supervised CRFs requires 12k and 1.5k labeled sentences. Furthermore, when combined with labeled examples, our method yields significant improvements over state-of-the-art supervised methods, achieving best reported numbers to date on Chinese OntoNotes and German CoNLL-03 datasets.

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[4]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[5]  Andrew McCallum,et al.  High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models , 2010, ICML.

[6]  Dan Klein,et al.  Learning Better Monolingual Models with Unannotated Bilingual Text , 2010, CoNLL.

[7]  Ming-Wei Chang,et al.  Unified Expectation Maximization , 2012, NAACL.

[8]  Andrew McCallum,et al.  Alternating Projections for Learning with Expectation Constraints , 2009, UAI.

[9]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[10]  Gideon S. Mann,et al.  Leveraging Existing Resources using Generalized Expectation Criteria , 2007 .

[11]  Manaal Faruqui,et al.  Training and Evaluating a German Named Entity Recognizer with Semantic Generalization , 2010, KONVENS.

[12]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[13]  Andrew McCallum,et al.  Generalized expectation criteria for lightly supervised learning , 2011 .

[14]  Steven P. Abney,et al.  Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora , 2005, IJCNLP.

[15]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[16]  Zhifei Li,et al.  First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[17]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[18]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[19]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[20]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.

[21]  Noah A. Smith,et al.  Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[22]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[23]  Regina Barzilay,et al.  Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches , 2009, J. Artif. Intell. Res..

[24]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[25]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[26]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[27]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[28]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[29]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[30]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[31]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[32]  Srinivas Bangalore,et al.  Head-Transducer Models for Speech Translation and Their Automatic Acquisition from Bilingual Data , 2004, Machine Translation.

[33]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[34]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[35]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[36]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[37]  Rebecca Hwa,et al.  A Backoff Model for Bootstrapping Resources for Non-English Languages , 2005, HLT/EMNLP.

[38]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[39]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[40]  Wanxiang Che,et al.  Effective Bilingual Constraints for Semi-Supervised Learning of Named Entity Recognizers , 2013, AAAI.

[41]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[42]  Slav Petrov,et al.  Uptraining for Accurate Deterministic Question Parsing , 2010, EMNLP.