Linking named entities in Tweets with knowledge base via user interest modeling

Twitter has become an increasingly important source of information, with more than 400 million tweets posted per day. The task to link the named entity mentions detected from tweets with the corresponding real world entities in the knowledge base is called tweet entity linking. This task is of practical importance and can facilitate many different tasks, such as personalized recommendation and user interest discovery. The tweet entity linking task is challenging due to the noisy, short, and informal nature of tweets. Previous methods focus on linking entities in Web documents, and largely rely on the context around the entity mention and the topical coherence between entities in the document. However, these methods cannot be effectively applied to the tweet entity linking task due to the insufficient context information contained in a tweet. In this paper, we propose KAURI, a graph-based framework to collectively link all the named entity mentions in all tweets posted by a user via modeling the user's topics of interest. Our assumption is that each user has an underlying topic interest distribution over various named entities. KAURI integrates the intra-tweet local information with the inter-tweet user interest information into a unified graph-based framework. We extensively evaluated the performance of KAURI over manually annotated tweet corpus, and the experimental results show that KAURI significantly outperforms the baseline methods in terms of accuracy, and KAURI is efficient and scales well to tweet stream.

[1]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[2]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[3]  Ravi Kumar,et al.  Object matching in tweets with spatial models , 2012, WSDM '12.

[4]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[5]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[6]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[7]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[8]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[9]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[10]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[12]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[13]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[14]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[15]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[16]  Michael S. Bernstein,et al.  Short and tweet: experiments on recommending content from information streams , 2010, CHI.

[17]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[18]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[19]  Qing Yang,et al.  Discovering User Interest on Twitter with a Modified Author-Topic Model , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[20]  Wei Shen,et al.  LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[21]  Wei Shen,et al.  A graph-based approach for ontology population with named entities , 2012, CIKM '12.

[22]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[23]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[24]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[25]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[26]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.