Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams

In everyday life, we often observe unusually frequent interactions among people before or during important events, i.e., people receive/send more greetings from/to their friends on Christmas Day than regular days. We also observe that some videos suddenly go viral through people's sharing in online social networks (OSNs). Do these seemingly different phenomena share a common structure? All these phenomena are associated with sudden surges of user activities in networks, which we call bursts in this work. We uncover that the emergence of a burst is accompanied with the formation of triangles in networks. This finding motivates us to propose a new and robust method to detect bursts in OSNs. We first introduce a new measure, triadic cardinality distribution, corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, within a network. We demonstrate that this distribution not only changes when a burst occurs, but it also has a robustness property that it is immunized against common spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can reliably detect bursts in OSNs. To avoid handling massive activity data generated by OSN users during the triadic tracking, we design an efficient sample-estimate solution to provide maximum likelihood estimate on the triadic cardinality distribution from sampled data. Extensive experiments conducted on real data demonstrate the usefulness of this triadic cardinality distribution and effectiveness of our sample-estimate solution.

[1]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[2]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[3]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[4]  Nick Koudas,et al.  Identifying, attributing and describing spatial bursts , 2010, Proc. VLDB Endow..

[5]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[6]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[7]  D. Strang,et al.  DIFFUSION IN ORGANIZATIONS AND SOCIAL MOVEMENTS: From Hybrid Corn to Poison Pills , 1998 .

[8]  Jianping Pan,et al.  Fast and accurate traffic matrix measurement using adaptive cardinality counting , 2005, MineNet '05.

[9]  Daniel Zelterman,et al.  Sums of dependent Bernoulli random variables and disease clustering , 2002 .

[10]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[11]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[12]  William H. Turkett,et al.  Graph Mining of Motif Profiles For Computer Network Activity Inference [ Position Paper ] , 2011 .

[13]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[14]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[15]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[16]  Donald F. Towsley,et al.  Fisher information of sampled packets: an application to flow size estimation , 2006, IMC '06.

[17]  Masahiro Kimura,et al.  Burst Detection in a Sequence of Tweets Based on Information Diffusion Model , 2012, Discovery Science.

[18]  Ryota Tomioka,et al.  Discovering Emerging Topics in Social Streams via Link-Anomaly Detection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jon M. Kleinberg,et al.  The Directed Closure Process in Hybrid Social-Information Networks, with an Analysis of Link Formation on Twitter , 2010, ICWSM.

[20]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[21]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[22]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[23]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[24]  George M. Beal,et al.  THE DIFFUSION PROCESS , 1956 .

[25]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[26]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[27]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[28]  Donald F. Towsley,et al.  On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling , 2012, IEEE Journal on Selected Areas in Communications.

[29]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[30]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[31]  Jon M. Kleinberg,et al.  Center of Attention: How Facebook Users Allocate Attention across Friends , 2011, ICWSM.

[32]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[33]  Anja Feldmann,et al.  Understanding online social network usage from a network perspective , 2009, IMC '09.

[34]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[35]  Nick Koudas,et al.  Bursty subgraphs in social networks , 2013, WSDM.

[36]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[37]  Florin Ciucu,et al.  Longtime behavior of harvesting spam bots , 2012, IMC '12.

[38]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[39]  Don Towsley,et al.  Empirical analysis of the evolution of follower network: A case study on Douban , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[40]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[41]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[42]  Bruno Ribeiro,et al.  Modeling and predicting the growth and death of membership-based websites , 2013, WWW.

[43]  Jon M. Kleinberg,et al.  Event Detection via Communication Pattern Analysis , 2014, ICWSM.

[44]  M. De Domenico,et al.  The Anatomy of a Scientific Rumor , 2013, Scientific Reports.

[45]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[46]  Nish Parikh,et al.  Scalable and near real-time burst detection from eCommerce queries , 2008, KDD.

[47]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[48]  Jeonghee Yi,et al.  Detecting buzz from time-sequenced document streams , 2005, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service.

[49]  Divyakant Agrawal,et al.  Structural Trend Analysis for Online Social Networks , 2011, Proc. VLDB Endow..

[50]  P. Holme Network reachability of real-world contact sequences. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[52]  Krishna P. Gummadi,et al.  On word-of-mouth based discovery of the web , 2011, IMC '11.

[53]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[54]  S. Shen-Orr,et al.  Superfamilies of Evolved and Designed Networks , 2004, Science.

[55]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[56]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[57]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[58]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[59]  Emiliano De Cristofaro,et al.  Paying for Likes?: Understanding Facebook Like Fraud Using Honeypots , 2014, Internet Measurement Conference.

[60]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.