Entity Linking for Queries by Searching Wikipedia Sentences

We present a simple yet effective approach for linking entities in queries. The key idea is to search sentences similar to a query from Wikipedia articles and directly use the human-annotated entities in the similar sentences as candidate entities for the query. Then, we employ a rich set of features, such as link-probability, context-matching, word embeddings, and relatedness among candidate entities as well as their related entities, to rank the candidates under a regression based framework. The advantages of our approach lie in two aspects, which contribute to the ranking process and final linking result. First, it can greatly reduce the number of candidate entities by filtering out irrelevant entities with the words in the query. Second, we can obtain the query sensitive prior probability in addition to the static link-probability derived from all Wikipedia articles. We conduct experiments on two benchmark datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ dataset. Experimental results show that our method outperforms state-of-the-art systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ dataset.

[1]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[2]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Hinrich Schütze,et al.  A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries , 2016, WWW.

[5]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[6]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[7]  Juraj Hresko,et al.  Entity linking based on the co-occurrence graph and entity probability , 2014, ERD '14.

[8]  Maarten Marx,et al.  Entity linking by focusing DBpedia candidate entities , 2014, ERD '14.

[9]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..

[10]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[11]  Hsin-Hsi Chen,et al.  NTUNLP approaches to recognizing and disambiguating entities in long and short text at the ERD challenge 2014 , 2014, ERD '14.

[12]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[13]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[14]  W. Bruce Croft,et al.  Query representation and understanding workshop , 2011, SIGF.

[15]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[16]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[17]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[18]  Sam Steingold,et al.  A search based approach to entity recognition: magnetic and IISAS team at ERD challenge , 2014, ERD '14.

[19]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[20]  Daniel S. Weld,et al.  Design Challenges for Entity Linking , 2015, TACL.

[21]  Hinrich Schütze,et al.  The SMAPH system for query entity recognition and disambiguation , 2014, ERD '14.

[22]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..