Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph

Most previous analysis of Twitter user behavior has focused on individual information cascades and the social followers graph, in which the nodes for two users are connected if one follows the other. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, that the tweet rate distribution, although asymptotically power law, exhibits a lognormal cutoff over finite sample intervals, and that the inter-tweet interval distribution is a power law with exponential cutoff. The retweet graph is small-world and scale-free, like the social graph, but less disassortative and has much stronger clustering. These differences are consistent with it better capturing the real-world social relationships of and trust between users than the social graph. Beyond just understanding and modeling human communication patterns and social networks, applications for alternative, decentralized microblogging systems---both predicting real-word performance and detecting spam---are discussed.

[1]  S. N. Dorogovtsev,et al.  Scaling Behaviour of Developing and Decaying Networks , 2000, cond-mat/0005050.

[2]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[3]  Stasa Milojevic,et al.  Power law distributions in information science: Making the case for logarithmic binning , 2010, J. Assoc. Inf. Sci. Technol..

[4]  Vern Paxson,et al.  Adapting Social Spam Infrastructure for Political Censorship , 2012, LEET.

[5]  Fang Wu,et al.  Crowdsourcing, attention and productivity , 2008, J. Inf. Sci..

[6]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[7]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Nelly Litvak,et al.  Uncovering disassortativity in large scale-free networks. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Béla Bollobás,et al.  Directed scale-free graphs , 2003, SODA '03.

[10]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[11]  Pierre St. Juste,et al.  Litter: A Lightweight Peer-to-Peer Microblogging Service , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[12]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[13]  Ronald Dattero,et al.  A New Discrete Weibull Distribution , 1984, IEEE Transactions on Reliability.

[14]  G. Fagiolo Clustering in complex directed networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[16]  Michael Kaminsky,et al.  SybilGuard: defending against sybil attacks via social networks , 2006, SIGCOMM.

[17]  William J. Reed,et al.  The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions , 2004, WWW 2001.

[18]  Christian Bauckhage,et al.  The Weibull as a Model of Shortest Path Distributions in Random Networks , 2013 .

[19]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[20]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[21]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[22]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[23]  Jacob G Foster,et al.  Edge direction and the structure of networks , 2009, Proceedings of the National Academy of Sciences.

[24]  Avishai Mandelbaum,et al.  Statistical Analysis of a Telephone Call Center , 2005 .

[25]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[26]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[27]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[28]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[29]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[30]  Rizal Setya Perdana What is Twitter , 2013 .

[31]  S. McOnie,et al.  Measurement of 3 with a Dalitz plot analysis of B+→D(*) K(*)+ decay , 2004 .

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Haibo Hu,et al.  Disassortative mixing in online social networks , 2009, 0909.0450.

[34]  Marcus Kaiser Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks , 2008, 0802.2512.

[35]  T. Nakagawa,et al.  The Discrete Weibull Distribution , 1975, IEEE Transactions on Reliability.

[36]  A.R.M. Teutle,et al.  Twitter: Network properties analysis , 2010, 2010 20th International Conference on Electronics Communications and Computers (CONIELECOMP).

[37]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.

[38]  Kwang-Il Goh,et al.  Burstiness and memory in complex systems , 2006 .

[39]  D. Boyd,et al.  The Arab Spring| The Revolutions Were Tweeted: Information Flows during the 2011 Tunisian and Egyptian Revolutions , 2011 .

[40]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[41]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[42]  Dennis M. Wilkinson,et al.  Strong regularities in online peer production , 2008, EC '08.

[43]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[44]  A. Barabasi,et al.  Human dynamics: Darwin and Einstein correspondence patterns , 2005, Nature.

[45]  J. Davis Univariate Discrete Distributions , 2006 .

[46]  A. W. Kemp,et al.  Univariate Discrete Distributions: Johnson/Univariate Discrete Distributions , 2005 .

[47]  Jin Zhao,et al.  Cuckoo: towards decentralized, socio-aware online microblogging services and data measurements , 2010, HotPlanet '10.

[48]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[49]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[50]  Sean Borman,et al.  The Expectation Maximization Algorithm A short tutorial , 2006 .

[51]  Matthew K. Wright,et al.  #h00t: Censorship Resistant Microblogging , 2011, ArXiv.

[52]  G. Madey,et al.  Uncovering individual and collective human dynamics from mobile phone records , 2007, 0710.2939.

[53]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[54]  Michalis Faloutsos,et al.  The Connectivity and Fault-Tolerance of the Internet Topology , 2001 .

[55]  Michael Kaminsky,et al.  SybilLimit: A Near-Optimal Social Network Defense against Sybil Attacks , 2008, S&P 2008.

[56]  Peter Grassberger,et al.  Sampling properties of directed networks , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[58]  Jure Leskovec,et al.  Planetary-scale views on a large instant-messaging network , 2008, WWW.

[59]  Christopher M. Danforth,et al.  Twitter reciprocal reply networks exhibit assortativity with respect to happiness , 2011, J. Comput. Sci..

[60]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[61]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[62]  S. Berg Snowball Sampling—I , 2006 .

[63]  Koduvayur P. Subbalakshmi,et al.  Scam Detection in Twitter , 2014 .

[64]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[65]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[66]  Christos Faloutsos,et al.  Mobile call graphs: beyond power-law and lognormal distributions , 2008, KDD.

[67]  Arnaud Legout,et al.  The complete picture of the Twitter social graph , 2012, CoNEXT Student '12.

[68]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2003, SIGCOMM '03.

[69]  Niloy Ganguly,et al.  Effects of a soft cut-off on node-degree in the Twitter social network , 2012, Comput. Commun..

[70]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[71]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[72]  Maya Paczuski,et al.  Correlated dynamics in human printing behavior , 2004, ArXiv.

[73]  Wolfgang Kellerer,et al.  Outtweeting the Twitterers - Predicting Information Cascades in Microblogs , 2010, WOSN.

[74]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[75]  Dan S. Wallach,et al.  Birds of a FETHR: open, decentralized micropublishing , 2009, IPTPS.

[76]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[77]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[78]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[79]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .