A Topic-Sensitive Model for Salient Entity Linking

In recent years, the amount of entities in large knowledge bases available on the Web has been increasing rapidly. Such entities can be used to bridge textual data with knowledge bases and thus help with many tasks, such as text understanding, word sense disambiguation and information retrieval. The key issue is to link the entity mentions in documents with the corresponding entities in knowledge bases, referred to as entity linking. In addition, for many entity-centric applications, entity salience for a document has become a very important factor. This raises an impending need to identify a set of salient entities that are central to the input document. In this paper, we introduce a new task of salient entity linking and propose a graph-based disambiguation solution, which integrates several features, especially a topic-sensitive model based on Wikipedia categories. Experimental results show that our method significantly outperforms the state-of-the-art entity linking methods in terms of precision, recall and F-measure.

[1]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[2]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[3]  Michael Gamon,et al.  Identifying salient entities in web pages , 2013, CIKM.

[4]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[5]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[6]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[7]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[8]  Raphaël Troncy,et al.  Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web , 2014, LREC.

[9]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[11]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[13]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[14]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[15]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[16]  Raphaël Troncy,et al.  Learning with the Web: Spotting Named Entities on the Intersection of NERD and Machine Learning , 2013, #MSM.

[17]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .