Structural Trend Analysis for Online Social Networks

The identification of popular and important topics discussed in social networks is crucial for a better understanding of societal concerns. It is also useful for users to stay on top of trends without having to sift through vast amounts of shared information. Trend detection methods introduced so far have not used the network topology and has thus not been able to distinguish viral topics from topics that are diffused mostly through the news media. To address this gap, we propose two novel structural trend definitions we call coordinated and uncoordinated trends that use friendship information to identify topics that are discussed among clustered and distributed users respectively. Our analyses and experiments show that structural trends are significantly different from traditional trends and provide new insights into the way people share information online. We also propose a sampling technique for structural trend detection and prove that the solution yields in a gain in efficiency and is within an acceptable error bound. Experiments performed on a Twitter data set of 41.7 million nodes and 417 million posts show that even with a sampling rate of 0.005, the average precision is 0.93 for coordinated trends and 1 for uncoordinated trends.

[1]  Divyakant Agrawal,et al.  Where the blogs tip: connectors, mavens, salesmen and translators of the blogosphere , 2010, SOMA '10.

[2]  Lucy T. Nowell,et al.  ThemeRiver: visualizing theme changes over time , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[3]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[4]  Jennifer Neville,et al.  Randomization tests for distinguishing social influence and homophily effects , 2010, WWW '10.

[5]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[6]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[7]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[8]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[9]  Chris Beaumont,et al.  Distribution-Free Statistical Methods , 1982 .

[10]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[11]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[12]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[13]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[14]  Shishir Bharathi,et al.  Competitive Influence Maximization in Social Networks , 2007, WINE.

[15]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[16]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Marios Hadjieleftheriou,et al.  Finding the frequent items in streams of data , 2009, CACM.

[18]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[19]  Divyakant Agrawal,et al.  Limiting the spread of misinformation in social networks , 2011, WWW.

[20]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[21]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[22]  Michael Kaminsky,et al.  SybilGuard: Defending Against Sybil Attacks via Social Networks , 2008, IEEE/ACM Transactions on Networking.

[23]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[24]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[25]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[26]  Krishna P. Gummadi,et al.  An analysis of social network-based Sybil defenses , 2010, SIGCOMM '10.

[27]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[28]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[29]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[30]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[31]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[32]  Evimaria Terzi,et al.  Learning the Nature of Information in Social Networks , 2012, ICWSM.

[33]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[34]  Jeffrey Heer,et al.  prefuse: a toolkit for interactive information visualization , 2005, CHI.

[35]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[36]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[37]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[39]  Stefan M. Wild,et al.  Maximizing influence in a competitive social network: a follower's perspective , 2007, ICEC.