Entity Disambiguation with Linkless Knowledge Bases

Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived from the reference knowledge base, namely, the context similarity and the semantic relatedness. Both features heavily rely on the cross-document hyperlinks within the knowledge base: the semantic relatedness feature is directly measured via those hyperlinks, while the context similarity feature implicitly makes use of those hyperlinks to expand entity candidates' descriptions and then compares them against the query context. Unfortunately, cross-document hyperlinks are rarely available in many closed domain knowledge bases and it is very expensive to manually add such links. Therefore few algorithms can work well on linkless knowledge bases. In this work, we propose the challenging Named Entity Disambiguation with Linkless Knowledge Bases (LNED) problem and tackle it by leveraging the useful disambiguation evidences scattered across the reference knowledge base. We propose a generative model to automatically mine such evidences out of noisy information. The mined evidences can mimic the role of the missing links and help boost the LNED performance. Experimental results show that our proposed method substantially improves the disambiguation accuracy over the baseline approaches.

[1]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[2]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[3]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[4]  Haixun Wang,et al.  Wikification via link co-occurrence , 2013, CIKM.

[5]  Avirup Sil,et al.  Linking Named Entities to Any Database , 2012, EMNLP.

[6]  Jiawei Han,et al.  A probabilistic model for linking named entities in web text with heterogeneous information networks , 2014, SIGMOD Conference.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Edward Y. Chang,et al.  Entity Disambiguation with Freebase , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[9]  Vasudeva Varma,et al.  IIIT Hyderabad at TAC 2009 , 2008, TAC.

[10]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[11]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[12]  Kuansan Wang,et al.  Entity linking at the tail: sparse signals, unknown entities, and phrase models , 2014, WSDM.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  Jian Su,et al.  Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[15]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[16]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[17]  Ravi Kumar,et al.  Object matching in tweets with spatial models , 2012, WSDM '12.

[18]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[19]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[20]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[21]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[22]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[23]  Ben Hachey,et al.  Entity Disambiguation with Web Links , 2015, TACL.

[24]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[25]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[26]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[27]  Prithviraj Sen,et al.  Collective context-aware topic models for entity disambiguation , 2012, WWW.

[28]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[29]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Patrick Pantel,et al.  Jigs and Lures: Associating Web Queries with Structured Entities , 2011, ACL.