Cross-lingual Name Tagging and Linking for 282 Languages

The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.

[1]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[2]  Nizar Habash,et al.  Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking , 2008, ACL.

[3]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[4]  Gary Geunbae Lee,et al.  Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web , 2003, ACL.

[5]  Joel Nothman,et al.  Classifying articles in English and German Wikipedia , 2009, ALTA.

[6]  Kristina Toutanova,et al.  Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia , 2012, ACL.

[7]  Gerhard Weikum,et al.  HYENA: Hierarchical Type Classification for Entity Names , 2012, COLING.

[8]  Nevena Lazic,et al.  Embedding Methods for Fine Grained Entity Type Classification , 2015, ACL.

[9]  Chris Dyer,et al.  Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik , 2016, COLING.

[10]  Gerhard Weikum,et al.  FINET: Context-Aware Fine-Grained Named Entity Typing , 2015, EMNLP.

[11]  Douglas W. Oard,et al.  Building a Cross-Language Entity Linking Collection in Twenty-One Languages , 2011, CLEF.

[12]  Peter Mika,et al.  Learning to Tag and Tagging to Learn: A Case Study on Wikipedia , 2008, IEEE Intelligent Systems.

[13]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[14]  Asif Ekbal,et al.  Assessing the Challenge of Fine-Grained Named Entity Recognition and Classification , 2010, NEWS@ACL.

[15]  Mikko Kurimo,et al.  Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology , 2014, COLING.

[16]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17]  Heng Ji,et al.  Analysis and Repair of Name Tagger Errors , 2006, ACL.

[18]  Gerhard Weikum,et al.  Fine-grained Semantic Typing of Emerging Entities , 2013, ACL.

[19]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[20]  Simone Paolo Ponzetto,et al.  Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution , 2006, NAACL.

[21]  Mark G. Lee,et al.  Mapping Arabic Wikipedia into the Named Entities Taxonomy , 2012, COLING.

[22]  Wanxiang Che,et al.  Revisiting Embedding Features for Simple Semi-supervised Learning , 2014, EMNLP.

[23]  Markus Forsberg,et al.  Paradigm classification in supervised learning of morphology , 2015, HLT-NAACL.

[24]  Douglas W. Oard,et al.  Cross-Language Entity Linking , 2011, IJCNLP.

[25]  Claudio Giuliano Fine-Grained Classification of Named Entities Exploiting Latent Semantic Kernels , 2009, CoNLL.

[26]  Clare R. Voss,et al.  ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering , 2015, KDD.

[27]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[28]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[29]  Wanxiang Che,et al.  Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition , 2013, ACL.

[30]  Christopher D. Manning,et al.  Cross-lingual Projected Expectation Regularization for Weakly Supervised Learning , 2014, TACL.

[31]  Heng Ji,et al.  Overview of TAC-KBP2016 Tri-lingual EDL and Its Impact on End-to-End KBP , 2016, TAC.

[32]  Heng Ji,et al.  Bitext Name Tagging for Cross-lingual Entity Annotation Projection , 2016, COLING.

[33]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[34]  Wisam Dakka,et al.  Augmenting Wikipedia with Named Entity Tags , 2008, IJCNLP.

[35]  Joel Nothman,et al.  Transforming Wikipedia into Named Entity Training Data , 2008, ALTA.

[36]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[37]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[38]  Heng Ji,et al.  Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning , 2016, HLT-NAACL.

[39]  Heng Ji,et al.  Joint bilingual name tagging for parallel corpora , 2012, CIKM '12.

[40]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[41]  Udo Kruschwitz,et al.  A Semi-supervised Learning Approach to Arabic Named Entity Recognition , 2013, RANLP.

[42]  Nevena Lazic,et al.  Context-Dependent Fine-Grained Entity Type Tagging , 2014, ArXiv.

[43]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[44]  Heshaam Faili,et al.  Supervised Morphology Generation Using Parallel Corpus , 2013, RANLP.

[45]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[46]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[47]  Heng Ji,et al.  Unsupervised Entity Linking with Abstract Meaning Representation , 2015, NAACL.

[48]  Mikko Kurimo,et al.  A Comparative Study of Minimally Supervised Morphological Segmentation , 2016, CL.

[49]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.

[50]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[51]  Joel Nothman,et al.  Unsupervised Biographical Event Extraction Using Wikipedia Traffic , 2014, ALTA.

[52]  Avirup Sil,et al.  One for All: Towards Language Independent Named Entity Linking , 2016, ACL.

[53]  Heng Ji,et al.  Exploiting Background Information Networks to Enhance Bilingual Event Extraction Through Topic Modeling , 2011 .

[54]  Udo Kruschwitz,et al.  Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia , 2014, EACL.

[55]  Dan Roth,et al.  Exploiting Background Knowledge for Relation Extraction , 2010, COLING.