SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks

Heterogeneous information networks that consist of multi-type, interconnected objects are becoming increasingly popular, such as social media networks and bibliographic networks. The task of linking named entity mentions detected from unstructured Web text with their corresponding entities in a heterogeneous information network is of practical importance for the problem of information network population. This task is challenging due to name ambiguity and limited knowledge existing in the network. Most existing entity linking methods focus on linking entities with Wikipedia and cannot be applied to our task. In this paper, we present SHINE+, a general framework for linking named entitieS in Web free text with a Heterogeneous I nformation NEtwork. We propose a probabilistic linking model, which unifies an entity popularity model with an entity object model. As the entity knowledge contained in the information network is insufficient, we propose a knowledge population algorithm to iteratively enrich the network entity knowledge by leveraging the context information of mentions mapped by the linking model with high confidence, which subsequently boosts the linking performance. Experimental results over two real heterogeneous information networks (i.e., DBLP and IMDb) demonstrate the effectiveness and efficiency of our proposed framework in comparison with the baselines.

[1]  Jiawei Han,et al.  A probabilistic model for linking named entities in web text with heterogeneous information networks , 2014, SIGMOD Conference.

[2]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[3]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[4]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[5]  Weiyi Meng,et al.  A Latent Topic Model for Complete Entity Resolution , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[7]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[9]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[10]  Gerhard Weikum,et al.  Discovering emerging entities with ambiguous names , 2014, WWW.

[11]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[12]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[13]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[14]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[15]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[16]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[17]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[18]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[19]  Christopher Joseph Pal,et al.  Improving Author Coreference by Resource-Bounded Information Gathering from the Web , 2007, IJCAI.

[20]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[21]  Wei Shen,et al.  LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[22]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[23]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[24]  Divesh Srivastava,et al.  Linking temporal records , 2011, VLDB 2011.

[25]  Anand Rajaraman,et al.  Building, maintaining, and using knowledge bases: a report from the trenches , 2013, SIGMOD '13.

[26]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[27]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[28]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[29]  Vincent Ng,et al.  Sieve-Based Entity Linking for the Biomedical Domain , 2015, ACL.

[30]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[31]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[32]  Philip S. Yu,et al.  ADANA: Active Name Disambiguation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[33]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[34]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[35]  Ravi Kumar,et al.  Object matching in tweets with spatial models , 2012, WSDM '12.

[36]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[37]  Patrick Pantel,et al.  Jigs and Lures: Associating Web Queries with Structured Entities , 2011, ACL.

[38]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[39]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[40]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.