RAProp: ranking tweets by exploiting the tweet/user/web ecosystem and inter-tweet agreement

The increasing popularity of Twitter renders improved trust- worthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweets? content alone. We present a novel ranking method called RAProp, which combines two orthogonal measures of relevance and trustworthiness of a tweet. The first, called Feature Score, measures the trustworthiness of the source of the tweet by extracting features from a 3-layer Twitter ecosystem consisting of users, tweets and webpages. The second measure, called agreement analysis, estimates the trustworthiness of the content of a tweet by analyzing whether the content is independently corroborated by other tweets. We view the candidate result set of tweets as the vertices of a graph, with the edges measuring the estimated agreement between each pair of tweets. The feature score is propagated over this agreement graph to compute the top-k tweets that have both trustworthy sources and independent corroboration. The evaluation of our method on 16 million tweets from the TREC 2011 Microblog Dataset shows that for top-30 precision, we achieve 53% better precision than the current best performing method on the data set, and an improvement of 300% over current Twitter Search.

[1]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[2]  Donald Metzler,et al.  USC/ISI at TREC 2011: Microblog Track , 2011, TREC.

[3]  S. Kambhampati,et al.  RAProp: Ranking Tweets by Exploiting the Tweet/User/Web Ecosystem by Srijith Ravikumar A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved April 2013 by the Graduate Supervisory Committee: Subbarao Kambhampati, Chair , 2013 .

[4]  Mohammad Ali Abbasi,et al.  Measuring User Credibility in Social Media , 2013, SBP.

[5]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[6]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2010, WWW.

[7]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[8]  James A. Hendler,et al.  Trust Networks on the Semantic Web , 2003, WWW.

[9]  Matthew Richardson,et al.  Trust Management for the Semantic Web , 2003, SEMWEB.

[10]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[11]  Yolanda Gil,et al.  A survey of trust in computer science and the Semantic Web , 2007, J. Web Semant..

[12]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Martine De Cock,et al.  Ranking Approaches for Microblog Search , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[15]  Mohand Boughanem,et al.  Featured Tweet Search: Modeling Time and Social Influence for Microblog Retrieval , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[16]  Jiawei Han,et al.  Evaluating Event Credibility on Twitter , 2012, SDM.

[17]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[18]  References , 1971 .

[19]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[20]  Hiroyuki Kitagawa,et al.  TURank: Twitter User Ranking Based on User-Tweet Graph Analysis , 2010, WISE.

[21]  Jiawei Han,et al.  Heterogeneous network-based trust analysis: a survey , 2011, SKDD.

[22]  Ricardo A. Baeza-Yates,et al.  Pagerank Increase under Different Collusion Topologies , 2005, AIRWeb.

[23]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[24]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[25]  Tamer Elsayed,et al.  BEST of KAUST at TREC 2011: Building Effective Search in Twitter , 2011, TREC.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[28]  Craig MacDonald,et al.  Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents , 2013, OAIR.

[29]  W. Bruce Croft,et al.  Quality models for microblog retrieval , 2012, CIKM.

[30]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[31]  Jung-Tae Lee,et al.  Finding interesting posts in Twitter based on retweet graph analysis , 2012, SIGIR '12.