Design Challenges in Low-resource Cross-lingual Entity Linking

Cross-lingual Entity Linking (XEL) grounds mentions of entities that appear in a foreign (source) language text into an English (target) knowledge base (KB) such as Wikipedia. XEL consists of two steps: candidate generation, which retrieves a list of candidate entities for each mention, followed by candidate ranking. XEL methods have been successful on high-resource languages, but generally perform poorly on low-resource languages due to lack of supervision. In this paper, we show a thorough analysis on existing low-resource XEL methods, especially on their candidate generation methods and limitations. We observed several interesting findings: 1. They are heavily limited by the Wikipedia bilingual resource coverage. 2. They perform better on Wikipedia text than on real-world text such as news or twitter. In this paper, we claim that, under the low-resource language setting, outside-Wikipedia cross-lingual resources are essential. To prove this argument, we propose a simple but effective zero-shot framework, CogCompXEL, that complements current methods by utilizing query log mapping files from online search engines. CogCompXEL outperforms current state-of-the-art models on almost all 25 languages of the LORELEI dataset, achieving an absolute average increase of 25% in gold candidate recall.

[1]  Graham Neubig,et al.  Towards Zero-resource Cross-lingual Entity Linking , 2019, EMNLP.

[2]  Enrique Alfonseca,et al.  Acquisition of instance attributes via labeled and related instances , 2010, SIGIR.

[3]  Dan Roth,et al.  Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages , 2018, EMNLP.

[4]  Dan Roth,et al.  Learning Better Name Translation for Cross-Lingual Wikification , 2018, AAAI.

[5]  Stephanie Strassel,et al.  LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages , 2016, LREC.

[6]  Graham Neubig,et al.  Improving Candidate Generation for Low-resource Cross-lingual Entity Linking , 2020, TACL.

[7]  Heng Ji,et al.  ELISA-EDL: A Cross-lingual Entity Extraction, Linking and Localization System , 2018, NAACL.

[8]  Siddharth Dalmia,et al.  Epitran: Precision G2P for Many Languages , 2018, LREC.

[9]  Dan Roth,et al.  Cross-lingual Wikification Using Multilingual Embeddings , 2016, NAACL.

[10]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[11]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[12]  Hinrich Schütze,et al.  Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition , 2011, ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[15]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[16]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[17]  Heng Ji,et al.  Overview of TAC-KBP2016 Tri-lingual EDL and Its Impact on End-to-End KBP , 2016, TAC.

[18]  Michael Gamon,et al.  Mining Entity Types from Query Logs via User Intent Modeling , 2012, ACL.

[19]  Dan Roth,et al.  Joint Multilingual Supervision for Cross-lingual Entity Linking , 2018, EMNLP.

[20]  Gerhard Weikum,et al.  Named Entity Disambiguation for Resource-Poor Languages , 2015, ESAIR@CIKM.

[21]  Jaime G. Carbonell,et al.  Zero-shot Neural Transfer for Cross-lingual Entity Linking , 2018, AAAI.