The self-feeding process: a unifying model for communication dynamics in the web

How often do individuals perform a given communication activity in the Web, such as posting comments on blogs or news? Could we have a generative model to create communication events with realistic inter-event time distributions (IEDs)? Which properties should we strive to match? Current literature has seemingly contradictory results for IED: some studies claim good fits with power laws; others with non-homogeneous Poisson processes. Given these two approaches, we ask: which is the correct one? Can we reconcile them all? We show here that, surprisingly, both approaches are correct, being corner cases of the proposed Self-Feeding Process (SFP). We show that the SFP (a) exhibits a unifying power, which generates power law tails (including the so-called "top-concavity" that real data exhibits), as well as short-term Poisson behavior; (b) avoids the "i.i.d. fallacy", which none of the prevailing models have studied before; and (c) is extremely parsimonious, requiring usually only one, and in general, at most two parameters. Experiments conducted on eight large, diverse real datasets (e.g., Youtube and blog comments, e-mails, SMSs, etc) reveal that the SFP mimics their properties very well.

[1]  T. Mahmood Survival of Newly Founded Businesses: A Log-Logistic Model Approach , 2000 .

[2]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[3]  R. CesarA.Hidalgo,et al.  Scaling in the Inter-Event Time of Random and Seasonal Systems , 2005 .

[4]  Munmun De Choudhury,et al.  Social Synchrony: Predicting Mimicry of User Actions in Online Social Media , 2009, 2009 International Conference on Computational Science and Engineering.

[5]  Albert-László Barabási,et al.  Universal features of correlated bursty behaviour , 2011, Scientific Reports.

[6]  Christos Faloutsos,et al.  Quantifying Reciprocity in Large Weighted Communication Networks , 2012, PAKDD.

[7]  L. Amaral,et al.  On Universality in Human Correspondence Activity , 2009, Science.

[8]  Marcin Owczarczuk,et al.  Long memory in patterns of mobile phone usage , 2012 .

[9]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[10]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[11]  S. Bennett,et al.  Log‐Logistic Regression Models for Survival Data , 1983 .

[12]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[13]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[14]  Gordon Johnston,et al.  Statistical Models and Methods for Lifetime Data , 2003, Technometrics.

[15]  Michalis Faloutsos,et al.  A nonstationary Poisson view of Internet traffic , 2004, IEEE INFOCOM 2004.

[16]  Jean-Pierre Eckmann,et al.  Entropy of dialogues creates coherent structures in e-mail traffic. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. I. Ahmad,et al.  Log-logistic flood frequency analysis , 1988 .

[18]  P. Fisk THE GRADUATION OF INCOME DISTRIBUTIONS , 1961 .

[19]  Jin Cao,et al.  Internet Traffic Tends Toward Poisson and Independent as the Load Increases , 2003 .

[20]  David Mazières,et al.  RE: Reliable Email , 2006, NSDI.

[21]  Albert-László Barabási,et al.  Modeling bursts and heavy tails in human dynamics , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  A. Barabasi,et al.  Human dynamics: Darwin and Einstein correspondence patterns , 2005, Nature.

[23]  Christos Faloutsos,et al.  Surprising Patterns for the Call Duration Distribution of Mobile Phone Users , 2010, ECML/PKDD.

[24]  H. Wold On stationary point processes and Markov chains , 1948 .

[25]  Gholamreza Haffari,et al.  Modeling the Temporal Dynamics of Social Rating Networks Using Bidirectional Effects of Social Relations and Rating Patterns , 2010, ICDM Workshops.

[26]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[27]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[28]  M. O. Lorenz,et al.  Methods of Measuring the Concentration of Wealth , 1905, Publications of the American Statistical Association.

[29]  David Vere-Jones,et al.  Point Processes , 2011, International Encyclopedia of Statistical Science.

[30]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[31]  Duncan J. Watts,et al.  Characterizing individual communication patterns , 2009, KDD.

[32]  D. Cox Some Statistical Methods Connected with Series of Events , 1955 .

[33]  Yehuda Koren,et al.  Care to comment?: recommendations for commenting on news stories , 2012, WWW.

[34]  Hao Jiang,et al.  Why is the internet traffic bursty in short time scales? , 2005, SIGMETRICS '05.

[35]  Susan T. Dumais,et al.  Modeling and predicting behavioral dynamics on the web , 2012, WWW.

[36]  F. Haight Handbook of the Poisson Distribution , 1967 .

[37]  Swapna S. Gokhale,et al.  Log-logistic software reliability growth model , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[38]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[39]  R. Hidalgo,et al.  Conditions for the emergence of scaling in the inter-event time of uncorrelated and seasonal systems , 2006 .

[40]  Anatol Kuczura,et al.  The interrupted poisson process as an overflow process , 1973 .

[41]  J. K. Ord,et al.  Handbook of the Poisson Distribution , 1967 .

[42]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[43]  Christos Faloutsos,et al.  Human Dynamics in Large Communication Networks , 2011, SDM.