Microblog semantic context retrieval system based on linked open data and graph-based theory

We present a novel information retrieval system for context similarity retrieval in microblogging platforms.We present a method for extracting and linking entities to DBpedia concepts.We contextualize all matched concepts using graph centrality property by defining a new weighting factor.We present two algorithms which perform the semantic similarity by considering the weight of concepts and their related concepts.We use a real Twitter dataset to show the effectiveness of our system. Microblogging platforms have emerged as large collections of short documents. In fact, the provision of an effective way to retrieve short text presents a significant research challenge owing to several factors: creative language usage, high contextualization, the informal nature of micro blog posts and the limited length of this form of communication. Thus, micro blogging retrieval systems suffer from the problems of data sparseness and the semantic gap. This makes it inadequate to accurately meet users' information needs because users compose tweets using few terms and without query terms inside; thus, many relevant tweets will not be retrieved. To overcome the problems of data sparseness and the semantic gap, recent studies on content-based microblog searching have focused on adding semantics to micro posts by linking short text to knowledge bases resources. Moreover, previous studies use bag-of-concepts representation by linking named entities to their corresponding knowledge base concepts. However, bag-of-concepts representation considers only concepts that match named entities and supposes that all concepts are equivalent and independent. Thus, in this paper, we present a graph-of-concepts method that considers the relationships among concepts that match named entities in short text and their related concepts and contextualizes each concept in the graph by leveraging the linked nature of DBpedia as a Linked Open Data knowledge base and graph-based centrality theory. Furthermore, we propose a similarity measure that computes the similarity between two graphs (query-tweet) by considering the relationships between the contextualized concepts. Finally, we introduce some experiment results, using a real Twitter dataset, to expose the effectiveness of our system.

[1]  Ilknur Celik,et al.  Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter , 2011, SEMWEB.

[2]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[3]  Huan Liu,et al.  Enriching short text representation in microblog for clustering , 2012, Frontiers of Computer Science.

[4]  M. de Rijke,et al.  The Impact of Semantic Document Expansion on Cluster-Based Fusion for Microblog Search , 2014, ECIR.

[5]  Antonio Moreno,et al.  Unsupervised topic discovery in micro-blogging networks , 2015, Expert Syst. Appl..

[6]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[7]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[8]  Yuefeng Li,et al.  Retrieving Information from Microblog Using Pattern Mining and Relevance Feedback , 2012, ICDKE.

[9]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[10]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[13]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[14]  M. de Rijke,et al.  Generating links to background knowledge: a case study using narrative radiology reports , 2011, CIKM '11.

[15]  Ngoc Thanh Nguyen,et al.  Semantic similarity measures for enhancing information retrieval in folksonomies , 2013, Expert Syst. Appl..

[16]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[17]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[18]  Matteo Magnani,et al.  Conversation Retrieval from Twitter , 2011, ECIR.

[19]  Geert-Jan Houben,et al.  Twinder: A Search Engine for Twitter Streams , 2012, ICWE.

[20]  Xiang Wang,et al.  Short Text Classification Using Wikipedia Concept Based Document Representation , 2013, 2013 International Conference on Information Technology and Applications.

[21]  Qi Gao,et al.  Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web , 2011, ESWC.

[22]  Miles Efron,et al.  Query polyrepresentation for ranking retrieval systems without relevance judgments , 2010 .

[23]  Amit P. Sheth,et al.  Linked Open Social Signals , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[24]  Seán O'Riain,et al.  Querying Linked Data Using Semantic Relatedness: A Vocabulary Independent Approach , 2011, NLDB.

[25]  M. de Rijke,et al.  Linking online news and social media , 2011, WSDM '11.

[26]  Thomas Gottron,et al.  Searching microblogs: coping with sparsity and document quality , 2011, CIKM '11.

[27]  M. de Rijke,et al.  Mapping queries to the Linking Open Data cloud: A case study using DBpedia , 2011, J. Web Semant..

[28]  Rada Mihalcea,et al.  Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity , 2007 .

[29]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[30]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[31]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[32]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[33]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[34]  Kazuhiro Seki,et al.  Time-Aware Latent Concept Expansion for Microblog Search , 2014, ICWSM.

[35]  Peter Mika,et al.  Making Sense of Twitter , 2010, SEMWEB.

[36]  Geert-Jan Houben,et al.  What Makes a Tweet Relevant for a Topic? , 2012, #MSM.

[37]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[38]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[39]  Diego Roa,et al.  Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval , 2014, TREC.

[40]  Ting Liu,et al.  Microblog Entity Linking by Leveraging Extra Posts , 2013, EMNLP.

[41]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[42]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yuefeng Li,et al.  Microblog Retrieval Using Topical Features and Query Expansion , 2011, TREC.