Zero-shot Neural Transfer for Cross-lingual Entity Linking

Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-basedentity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner. With experiments on 9 low-resource languages and transfer through a total of54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario.1 Further, we also investigate the use of language-universal phonological representations which improves average accuracy (absolute) by 36% when transferring between languages that use different scripts.

[1]  Yulia Tsvetkov,et al.  Constraint-Based Models of Lexical Borrowing , 2015, NAACL.

[2]  G. Banti Two Cushitic Systems: Somali and Oromo Nouns , 1988 .

[3]  Ryan Cotterell,et al.  Cross-lingual Character-Level Neural Morphological Tagging , 2017, EMNLP.

[4]  Philipp Cimiano,et al.  Enriching the crosslingual link structure of Wikipedia - A classification-based approach , 2008, AAAI 2008.

[6]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[7]  Siddharth Dalmia,et al.  Epitran: Precision G2P for Many Languages , 2018, LREC.

[8]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[9]  Jaime G. Carbonell,et al.  Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings , 2016, EMNLP.

[10]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[11]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[14]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[15]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[16]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[17]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[18]  Dan Roth,et al.  Learning Better Name Translation for Cross-Lingual Wikification , 2018, AAAI.

[19]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[20]  Heng Ji,et al.  Overview of TAC-KBP2017 13 Languages Entity Discovery and Linking , 2017, TAC.

[21]  Bhaskar Mitra,et al.  Neural Models for Information Retrieval , 2017, ArXiv.

[22]  Yu-Chun Wang,et al.  Cross-language and Cross-encyclopedia Article Linking Using Mixed-language Topic Model and Hypernym Translation , 2014, ACL.

[23]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[24]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[25]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[26]  Taraka Rama,et al.  Distance-based Phylogenetic Inference Algorithms in the Subgrouping of Dravidian Languages , 2012 .

[27]  Gourab Kundu,et al.  Neural Cross-Lingual Entity Linking , 2017, AAAI.

[28]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[29]  Dan Roth,et al.  Cross-lingual Wikification Using Multilingual Embeddings , 2016, NAACL.

[30]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[31]  Yulia Tsvetkov,et al.  Cross-Lingual Bridges with Models of Lexical Borrowing , 2016, J. Artif. Intell. Res..

[32]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[33]  Trevor Cohn,et al.  Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary , 2017, ACL.

[34]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[35]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[36]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[37]  Chris Dyer,et al.  PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors , 2016, COLING.

[38]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[39]  Douglas W. Oard,et al.  Cross-Language Entity Linking , 2011, IJCNLP.

[40]  Patrick Littell,et al.  URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors , 2017, EACL.

[41]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.