Entity disambiguation in tweets leveraging user social profiles

Pervasive web and social networks are becoming part of everyone's life. Users through their activities on these networks are leaving traces of their expertise, interests and personalities. With the advances in Web mining and user modeling techniques it is possible to leverage the user social network activity history to extract the semantics of user-generated content. In this work we explore various techniques for constructing user profiles based on the content they publish on social networks. We further show that one of the advantages of maintaining social network user profiles is to provide the context for better understanding of microposts. We propose and experimentally evaluate different approaches for entity disambiguation in social networks based on syntactic and semantic features on top of two different social networks: a general-interest network (i.e., Twitter) and a domain-specific network (i.e., StackOverflow). We demonstrate how disambiguation accuracy increases when considering enriched user profiles integrating content from both social networks.

[1]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[2]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[3]  Karl Aberer,et al.  What have fruits to do with technology?: the case of Orange, Blackberry and Apple , 2011, WIMS '11.

[4]  Rohini K. Srihari,et al.  Cross document person name disambiguation using entity profiles , 2009, TAC.

[5]  David Guy Brizan,et al.  A. Survey of Entity Resolution and Record Linkage Methodologies , 2015, Communications of the IIMA.

[6]  Kiran Bhowmick,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2015 .

[7]  Yong Yu,et al.  A comparative study of users' microblogging behavior on sina weibo and twitter , 2012, UMAP.

[8]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[9]  Jen-Tzung Chien,et al.  A new topic-bridged model for transfer learning , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Julio Gonzalo,et al.  WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task , 2010, CLEF.

[12]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[13]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[14]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[15]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[16]  Jun Ma,et al.  Transfer Topic Modeling with Ease and Scalability , 2012, SDM.

[17]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[18]  Khaled Shaalan,et al.  FiVaTech: Page-Level Web Data Extraction from Template Pages , 2007 .

[19]  Karl Aberer,et al.  Entity-based Classification of Twitter Messages , 2012, Int. J. Comput. Sci. Appl..

[20]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[21]  Barry Smyth,et al.  A multi-faceted user model for twitter , 2012, UMAP.

[22]  Catherine C. Marshall,et al.  Rethinking the web as a personal archive , 2013, WWW.

[23]  Timothy Baldwin,et al.  Mining Micro-blogs: Opportunities and Challenges , 2012, Computational Social Networks.

[24]  Walt Detmar Meurers,et al.  CoMeT: Integrating different levels of linguistic modeling for meaning assessment , 2013, *SEMEVAL.

[25]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[26]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[27]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[28]  Eelco Herder,et al.  Extraction of Professional Interests from Social Web Profiles , 2011 .

[29]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[30]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.