Rhea: Adaptively sampling authoritative content from social activity streams

Processing the full activity stream of a social network in real time is oftentimes prohibitive in terms of both storage and computational cost. One way to work around this problem is to take a sample of the social activity and use this sample to feed into applications such as content recommendation, opinion mining, or sentiment analysis. In this paper, we study the problem of extracting samples of authoritative content from a social activity stream. Specifically, we propose an adaptive stream sampling approach, termed Rhea, that processes a stream of social activity in real-time and samples the content of users that are more likely to provide influential information. To the best of our knowledge, Rhea is the first algorithm that dynamically adapts over time to account for evolving trends in the activity stream. Thus, we are able to capture high quality content from emerging users that contemporary white-list based methods ignore. We evaluate Rhea using two popular social networks reaching up to half a billion posts. Our results show that we significantly outperform previously proposed methods in terms of both recall and precision, while also offering remarkably more accurate ranking.

[1]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[2]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[3]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[4]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[5]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[6]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[7]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[8]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[9]  Ravi Kothari,et al.  Analysis of Sampling Algorithms for Twitter , 2015, IJCAI.

[10]  Krishna P. Gummadi,et al.  On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream , 2013, CIKM.

[11]  Krisztian Balog,et al.  ExperTime: tracking expertise over time , 2014, SIGIR.

[12]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[13]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[14]  Alessandro Bozzon,et al.  Choosing the right crowd: expert finding in social networks , 2013, EDBT '13.

[15]  Mohamed Bouguessa,et al.  Identifying Authorities in Online Communities , 2015, ACM Trans. Intell. Syst. Technol..

[16]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[17]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[18]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[19]  Eugene Agichtein,et al.  Discovering authorities in question answer communities by using link analysis , 2007, CIKM '07.

[20]  P. Pirolli,et al.  It's Not in Their Tweets: Modeling Topical Expertise of Twitter Users , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[21]  Nicola Santoro,et al.  Min-max heaps and generalized priority queues , 1986, CACM.

[22]  Krishna P. Gummadi,et al.  On the Wisdom of Experts vs. Crowds: Discovering Trustworthy Topical News in Microblogs , 2016, CSCW.

[23]  Krishna P. Gummadi,et al.  Cognos: crowdsourcing search for topic experts in microblogs , 2012, SIGIR '12.