Linking Named Entities across Languages using Multilingual Word Embeddings

Digital libraries are online collections of digital objects that can include text, images, audio, or videos in several languages. It has long been observed that named entities (NEs) are key to the access to digital library portals as they are contained in most user queries. However, NEs can have different spellings for each language which reduces the performance of user queries to retrieve documents across languages. Cross-lingual named entity linking (XEL) connects NEs from documents in a source language to external knowledge bases in another (target) language. The XEL task is especially challenging due to the diversity of NEs across languages and contexts. This paper describes an XEL system applied and evaluated with several languages pairs including English and various low-resourced languages of different linguistic families such as Croatian, Finnish, Estonian, and Slovenian. We tested this approach to analyze documents and NEs in low-resourced languages and link them to the English version of Wikipedia. We present the resulting study of this analysis and the challenges involved in the case of degraded documents from digital libraries. Further works will make an extensive analysis of the impact of our approach on the XEL task with OCRed documents.

[1]  Ivan Titov,et al.  Improving Entity Linking by Modeling Latent Relations between Mentions , 2018, ACL.

[2]  Mickaël Coustaty,et al.  Impact of OCR Errors on the Use of Digital Libraries: Towards a Better Access to Information , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[3]  Zhaochen Guo,et al.  Robust Entity Linking via Random Walks , 2014, CIKM.

[4]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[6]  Thomas Hofmann,et al.  Deep Joint Entity Disambiguation with Local Neural Attention , 2017, EMNLP.

[7]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[8]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[9]  Graham Neubig,et al.  Towards Zero-resource Cross-lingual Entity Linking , 2019, EMNLP.

[10]  Claire Cardie,et al.  Unsupervised Multilingual Word Embeddings , 2018, EMNLP.

[11]  Antoine Doucet,et al.  Impact of OCR Quality on Named Entity Linking , 2019, ICADL.

[12]  Thomas Hofmann,et al.  End-to-End Neural Entity Linking , 2018, CoNLL.

[13]  Olivier Raiman,et al.  DeepType: Multilingual Entity Linking by Neural Type System Evolution , 2018, AAAI.

[14]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[15]  Douglas W. Oard,et al.  Cross-Language Entity Linking , 2011, IJCNLP.

[16]  Salvatore Orlando,et al.  Learning relatedness measures for entity linking , 2013, CIKM.

[17]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[18]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[19]  Jaime G. Carbonell,et al.  Zero-shot Neural Transfer for Cross-lingual Entity Linking , 2018, AAAI.