Exploiting semantic similarity for named entity disambiguation in knowledge graphs

Abstract With the increasing popularity of large scale Knowledge Graph (KG)s, many applications such as semantic analysis, search and question answering need to link entity mentions in texts to entities in KGs. Because of the polysemy problem in natural language, entity disambiguation is thus a key problem in current research. Existing disambiguation methods have considered entity prominence, context similarity and entity-entity relatedness to discriminate ambiguous entities, which are mainly working on document or paragraph level texts containing rich contextual information, and based on lexical matching for computing context similarity. When meeting short texts containing limited contextual information, such as web queries, questions and tweets, those conventional disambiguation methods are not good at handling single entity mention and measuring context similarity. In order to enhance the performance of disambiguation methods based on context similarity with such short texts, we propose SCSNED method for disambiguation based on semantic similarity between contextual words and informative words of entities in KGs. Specially, we exploit the effectiveness of both knowledge-based and corpus-based semantic similarity methods for entity disambiguation with SCSNED. Moreover, we propose a Category2Vec embedding model based on joint learning of word and category embedding, in order to compute word-category similarity for entity disambiguation. We show the effectiveness of these proposed methods with illustrative examples, and evaluate their effectiveness in a comparative experiment for entity disambiguation in real world web queries, questions and tweets. The experimental results have identified the effectiveness of different semantic similarity methods, and demonstrated the improvement of semantic similarity methods in SCSNED and Category2Vec over the conventional context similarity baseline. We further compare the proposed approaches with the state of the art entity disambiguation systems and show the performances of the proposed approaches are among the best performing systems. In addition, one important feature of the proposed approaches using semantic similarity, is the potential application on any existing KGs since they mainly use common features of entity descriptions and categories. Another contribution of the paper is an updated survey on background of entity disambiguation in KGs and semantic similarity methods.

[1]  Andrés Montoyo,et al.  Spreading semantic information by Word Sense Disambiguation , 2017, Knowl. Based Syst..

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Maurice van Keulen,et al.  NEED4Tweet: A Twitterbot for Tweets Named Entity Extraction and Disambiguation , 2015, ACL.

[4]  Ioana Hulpus,et al.  Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation , 2015, International Semantic Web Conference.

[5]  Soumen Chakrabarti,et al.  Learning joint query interpretation and response ranking , 2013, WWW '13.

[6]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[7]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[8]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[11]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[12]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[13]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[14]  Joseph G. Davis,et al.  A semantic similarity measure for linked data: An information content-based approach , 2016, Knowl. Based Syst..

[15]  Xiaolong Wang,et al.  Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation , 2015, IJCAI.

[16]  Frank van Harmelen,et al.  Using Google distance to weight approximate ontology matches , 2007, WWW '07.

[17]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[18]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[19]  Yitong Li,et al.  Entity Linking for Tweets , 2013, ACL.

[20]  Salvatore Orlando,et al.  Learning relatedness measures for entity linking , 2013, CIKM.

[21]  Wanxiang Che,et al.  A Graph-based Method for Entity Linking , 2011, IJCNLP.

[22]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[23]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[24]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[25]  Tim Weninger,et al.  Forward backward similarity search in knowledge networks , 2016, Knowl. Based Syst..

[26]  Paul Buitelaar,et al.  Who are the American Vegans related to Brad Pitt?: Exploring Related Entities , 2015, WWW.

[27]  Katrin Weller,et al.  #Microposts2016: 6th Workshop on Making Sense of Microposts: Big things come in small packages , 2016, WWW.

[28]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[29]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[30]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[31]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Achim Rettinger,et al.  Context-Aware Entity Disambiguation in Text Using Markov Chains , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[33]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[34]  Ron Bekkerman,et al.  High-precision phrase-based document classification on a modern scale , 2011, KDD.

[35]  Carlos Angel Iglesias,et al.  Sematch: Semantic similarity framework for Knowledge Graphs , 2017, Knowl. Based Syst..

[36]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[37]  Matthias Hagen,et al.  Query segmentation revisited , 2011, WWW.

[38]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[39]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[40]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[41]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[42]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[43]  Norberto Fernández García,et al.  Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems , 2014, J. Artif. Intell. Res..

[44]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[45]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[46]  Krisztian Balog,et al.  Entity Linking in Queries: Tasks and Evaluation , 2015, ICTIR.

[47]  Harald Sack,et al.  Named Entity Linking in #Tweets with KEA , 2016, #Microposts.

[48]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[49]  Michael Granitzer,et al.  DoSeR - A Knowledge-Base-Agnostic Framework for Entity Disambiguation Using Semantic Embeddings , 2016, ESWC.

[50]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[51]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[52]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[53]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[54]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[55]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[56]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[57]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[58]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[59]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[60]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[61]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[62]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[63]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[64]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[65]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[66]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[67]  Dan Klein,et al.  Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks , 2016, NAACL.

[68]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[69]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[70]  Krisztian Balog,et al.  Exploiting Entity Linking in Queries for Entity Retrieval , 2016, ICTIR.

[71]  Wolfgang Nejdl,et al.  Combining a co-occurrence-based and a semantic measure for entity linking , 2013 .

[72]  Thomas Hofmann,et al.  Probabilistic Bag-Of-Hyperlinks Model for Entity Linking , 2015, WWW.

[73]  Raphaël Troncy,et al.  Enhancing Entity Linking by Combining NER Models , 2016, SemWebEval@ESWC.

[74]  M. de Rijke,et al.  Mapping queries to the Linking Open Data cloud: A case study using DBpedia , 2011, J. Web Semant..

[75]  Alexandre Passant,et al.  Measuring Semantic Distance on Linking Data and Using it for Resources Recommendations , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[76]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[77]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[78]  Ganggao Zhu,et al.  Computing Semantic Similarity of Concepts in Knowledge Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[79]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[80]  Ihab F. Ilyas,et al.  Interpreting keyword queries over web knowledge bases , 2012, CIKM '12.

[81]  Andrés Montoyo,et al.  A graph-Based Approach to WSD Using Relevant Semantic Trees and N-Cliques Model , 2012, CICLing.

[82]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[83]  Ming Li,et al.  Entity Disambiguation by Knowledge and Text Jointly Embedding , 2016, CoNLL.

[84]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[85]  Robert J. Gaizauskas,et al.  Collective Named Entity Disambiguation using Graph Ranking and Clique Partitioning Approaches , 2014, COLING.

[86]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[87]  Sören Auer,et al.  SINA: Semantic interpretation of user queries for question answering on interlinked data , 2015, J. Web Semant..

[88]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[89]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[90]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[91]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[92]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[93]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[94]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[95]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[96]  A. Tversky Features of Similarity , 1977 .

[97]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[98]  Avirup Sil,et al.  Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[99]  Mitul Tiwari,et al.  Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach , 2013, Proc. VLDB Endow..

[100]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[101]  Wei Shen,et al.  LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[102]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[103]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[104]  Massimiliano Ciaramita,et al.  A Scalable Gibbs Sampler for Probabilistic Entity Linking , 2014, ECIR.

[105]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[106]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[107]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[108]  Gerhard Paass,et al.  From names to entities using thematic context distance , 2011, CIKM '11.

[109]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[110]  Giuseppe Pirrò,et al.  REWOrD: Semantic Relatedness in the Web of Data , 2012, AAAI.