One for All: Towards Language Independent Named Entity Linking

Entity linking (EL) is the task of disambiguating mentions in text by associating them with entries in a predefined database of mentions (persons, organizations, etc). Most previous EL research has focused mainly on one language, English, with less attention being paid to other languages, such as Spanish or Chinese. In this paper, we introduce LIEL, a Language Independent Entity Linking system, which provides an EL framework which, once trained on one language, works remarkably well on a number of different languages without change. LIEL makes a joint global prediction over the entire document, employing a discriminative reranking framework with many domain and language-independent feature functions. Experiments on numerous benchmark datasets, show that the proposed system, once trained on one language, English, outperforms several state-of-the-art systems in English (by 4 points) and the trained model also works very well on Spanish (14 points better than a competitor system), demonstrating the viability of the approach.

[1]  Avirup Sil,et al.  Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[2]  Yao Meng,et al.  FRDC's Cross-lingual Entity Linking System at TAC 2013 , 2013, TAC.

[3]  Michael Strube,et al.  HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2012: A Joint Approach , 2012, TAC.

[4]  Avirup Sil,et al.  Linking Named Entities to Any Database , 2012, EMNLP.

[5]  K. Singh,et al.  Bootstrap: a Statistical Method , 2022 .

[6]  Heng Ji,et al.  Language and Domain Independent Entity Linking with Quantified Collective Validation , 2015, EMNLP.

[7]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[8]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[9]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[10]  Heng Ji,et al.  How to Speak a Language without Knowing It , 2014, ACL.

[11]  Heng Ji,et al.  Joint Learning of Chinese Words, Terms and Keywords , 2014, EMNLP.

[12]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[13]  Dan Roth,et al.  Relational Inference for Wikification , 2013, EMNLP.

[14]  Benjamin Heinzerling,et al.  HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2013 , 2013, TAC.

[15]  Douglas W. Oard,et al.  Cross-Language Entity Linking in Maryland during a Hurricane , 2011, TAC.

[16]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[17]  Avirup Sil,et al.  The MSR Systems for Entity Linking and Temporal Slot Filling at TAC 2013 , 2013, TAC.

[18]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[19]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[20]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Wen-Lian Hsu,et al.  Entity Disambiguation Using a Markov-Logic Network , 2011, IJCNLP.

[22]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[23]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.