论文信息 - Design Challenges in Low-resource Cross-lingual Entity Linking - 字舞流文

Design Challenges in Low-resource Cross-lingual Entity Linking

Cross-lingual Entity Linking (XEL) grounds mentions of entities that appear in a foreign (source) language text into an English (target) knowledge base (KB) such as Wikipedia. XEL consists of two steps: candidate generation, which retrieves a list of candidate entities for each mention, followed by candidate ranking. XEL methods have been successful on high-resource languages, but generally perform poorly on low-resource languages due to lack of supervision. In this paper, we show a thorough analysis on existing low-resource XEL methods, especially on their candidate generation methods and limitations. We observed several interesting findings: 1. They are heavily limited by the Wikipedia bilingual resource coverage. 2. They perform better on Wikipedia text than on real-world text such as news or twitter. In this paper, we claim that, under the low-resource language setting, outside-Wikipedia cross-lingual resources are essential. To prove this argument, we propose a simple but effective zero-shot framework, CogCompXEL, that complements current methods by utilizing query log mapping files from online search engines. CogCompXEL outperforms current state-of-the-art models on almost all 25 languages of the LORELEI dataset, achieving an absolute average increase of 25% in gold candidate recall.

Dan Roth | Xiaodong Yu | Xingyu Fu | Zian Zhao | Weijia Shi | D. Roth | Weijia Shi | Zian Zhao | Xiaodong Yu | Xingyu Fu

[1] Graham Neubig,et al. Towards Zero-resource Cross-lingual Entity Linking , 2019, EMNLP.

[2] Enrique Alfonseca,et al. Acquisition of instance attributes via labeled and related instances , 2010, SIGIR.

[3] Dan Roth,et al. Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages , 2018, EMNLP.

[4] Dan Roth,et al. Learning Better Name Translation for Cross-Lingual Wikification , 2018, AAAI.

[5] Stephanie Strassel,et al. LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages , 2016, LREC.

[6] Graham Neubig,et al. Improving Candidate Generation for Low-resource Cross-lingual Entity Linking , 2020, TACL.

[7] Heng Ji,et al. ELISA-EDL: A Cross-lingual Entity Extraction, Linking and Localization System , 2018, NAACL.

[8] Siddharth Dalmia,et al. Epitran: Precision G2P for Many Languages , 2018, LREC.

[9] Dan Roth,et al. Cross-lingual Wikification Using Multilingual Embeddings , 2016, NAACL.

[10] Jiawei Han,et al. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[11] Stephen D. Mayhew,et al. Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[12] Hinrich Schütze,et al. Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition , 2011, ACL.

[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[15] Mark Dredze,et al. Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[16] Sean Monahan,et al. Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[17] Heng Ji,et al. Overview of TAC-KBP2016 Tri-lingual EDL and Its Impact on End-to-End KBP , 2016, TAC.

[18] Michael Gamon,et al. Mining Entity Types from Query Logs via User Intent Modeling , 2012, ACL.

[19] Dan Roth,et al. Joint Multilingual Supervision for Cross-lingual Entity Linking , 2018, EMNLP.

[20] Gerhard Weikum,et al. Named Entity Disambiguation for Resource-Poor Languages , 2015, ESAIR@CIKM.

[21] Jaime G. Carbonell,et al. Zero-shot Neural Transfer for Cross-lingual Entity Linking , 2018, AAAI.