Word Embeddings for Unsupervised Named Entity Linking

The huge amount of textual user-generated content on the Web has incredibly grown in the last decade, creating new relevant opportunities for different real-world applications and domains. In particular, microblogging platforms enables the collection of continuously and instantly updated information. The organization and extraction of valuable knowledge from these contents are fundamental for ensuring profitability and efficiency to companies and institutions. This paper presents an unsupervised model for the task of Named Entity Linking in microblogging environments. The aim is to link the named entity mentions in a text with their corresponding knowledge-base entries exploiting a novel heterogeneous representation space characterized by more meaningful similarity measures between words and named entities, obtained by Word Embeddings. The proposed model has been evaluated on different benchmark datasets proposed for Named Entity Linking challenges for English and Italian language. It obtains very promising performance given the highly challenging environment of user-generated content over microblogging platforms.

[1]  Hiroyuki Shindo,et al.  Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia , 2018, ArXiv.

[2]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[3]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[4]  Giovanni Felici,et al.  Soft-constrained inference for Named Entity Recognition , 2014, Inf. Process. Manag..

[5]  Olga Simek,et al.  A Reverse Approach to Named Entity Extraction and Linking in Microposts , 2016, #Microposts.

[6]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[7]  Harald Sack,et al.  Named Entity Linking in #Tweets with KEA , 2016, #Microposts.

[8]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[10]  Pikakshi Manchanda,et al.  UNIMIB@NEEL-IT: Named Entity Recognition and Linking of Italian Tweets , 2016, CLiC-it/EVALITA.

[11]  Ikuya Yamada,et al.  An End-to-End Entity Linking Approach for Tweets , 2015, #MSM.

[12]  Gerhard Paass,et al.  From names to entities using thematic context distance , 2011, CIKM '11.

[13]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[14]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[15]  Annalina Caputo,et al.  UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets , 2015, #MSM.

[16]  Pikakshi Manchanda,et al.  UniMiB: Entity Linking in Tweets using Jaro-Winkler Distance, Popularity and Coherence , 2016, #Microposts.

[17]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[18]  G. Prasad LEARNING TO LINK ENTITIES WITH KNOWLEDGE BASE , 2016 .

[19]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[20]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[21]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[22]  Kalina Bontcheva,et al.  Microblog-genre noise and impact on semantic annotation accuracy , 2013, HT.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[25]  Anne-Lyse Minard,et al.  FBK-NLP at NEEL-IT: Active Learning for Domain Adaptation , 2016, CLiC-it/EVALITA.

[26]  Bahareh Rahmanzadeh Heravi,et al.  Kanopy4Tweets: Entity Extraction and Linking for Twitter , 2016, #Microposts.

[27]  Raphaël Troncy,et al.  Making Sense of Microposts (#Microposts2016) Named Entity rEcognition and Linking (NEEL) Challenge , 2015, #Microposts.

[28]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[29]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[30]  Anna Lisa Gentile,et al.  Overview of the EVALITA 2016 Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) Task , 2016, CLiC-it/EVALITA.

[31]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[32]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[33]  Tommaso Di Noia,et al.  sisinflab: an Ensemble of Supervised and Unsupervised Strategies for the NEEL-IT Challenge at Evalita 2016 , 2016, CLiC-it/EVALITA.

[34]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[35]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[36]  Pikakshi Manchanda,et al.  Adapting Named Entity Types to New Ontologies in a Microblogging Environment , 2018, IEA/AIE.

[37]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[38]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[39]  Pikakshi Manchanda,et al.  TWINE: A real-time system for TWeet analysis via INformation Extraction , 2017, EACL.

[40]  Pikakshi Manchanda,et al.  Towards adaptation of named entity classification , 2017, SAC.

[41]  Xianpei Han,et al.  NLPR_KBP in TAC 2009 KBP Track: A Two-Stage Method to Entity Linking , 2009, TAC.

[42]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.