Universal and Distinct Properties of Communication Dynamics

With the advancement of information systems, means of communications are becoming cheaper, faster, and more available. Today, millions of people carrying smartphones or tablets are able to communicate practically any time and anywhere they want. They can access their e-mails, comment on weblogs, watch and post videos and photos (as well as comment on them), and make phone calls or text messages almost ubiquitously. Given this scenario, in this article, we tackle a fundamental aspect of this new era of communication: How the time intervals between communication events behave for different technologies and means of communications. Are there universal patterns for the Inter-Event Time Distribution (IED)? How do inter-event times behave differently among particular technologies? To answer these questions, we analyzed eight different datasets from real and modern communication data and found four well-defined patterns seen in all the eight datasets. Moreover, we propose the use of the Self-Feeding Process (SFP) to generate inter-event times between communications. The SFP is an extremely parsimonious point process that requires at most two parameters and is able to generate inter-event times with all the universal properties we observed in the data. We also show three potential applications of the SFP: as a framework to generate a synthetic dataset containing realistic communication events of any one of the analyzed means of communications, as a technique to detect anomalies, and as a building block for more specific models that aim to encompass the particularities seen in each of the analyzed systems.

[1]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[2]  Christos Faloutsos,et al.  Surprising Patterns for the Call Duration Distribution of Mobile Phone Users , 2010, ECML/PKDD.

[3]  Albert-László Barabási,et al.  Universal features of correlated bursty behaviour , 2011, Scientific Reports.

[4]  L. Amaral,et al.  On Universality in Human Correspondence Activity , 2009, Science.

[5]  R. CesarA.Hidalgo,et al.  Scaling in the Inter-Event Time of Random and Seasonal Systems , 2005 .

[6]  Anatol Kuczura,et al.  The interrupted poisson process as an overflow process , 1973 .

[7]  Wang Bing-Hong,et al.  Heavy-Tailed Statistics in Short-Message Communication , 2009 .

[8]  Duncan J. Watts,et al.  Characterizing individual communication patterns , 2009, KDD.

[9]  R. Hidalgo,et al.  Conditions for the emergence of scaling in the inter-event time of uncorrelated and seasonal systems , 2006 .

[10]  S. Bennett,et al.  Log‐Logistic Regression Models for Survival Data , 1983 .

[11]  M. O. Lorenz,et al.  Methods of Measuring the Concentration of Wealth , 1905, Publications of the American Statistical Association.

[12]  H. Wold On stationary point processes and Markov chains , 1948 .

[13]  A. Barabasi,et al.  Dynamics of information access on the web. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[15]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[16]  Munmun De Choudhury,et al.  Social Synchrony: Predicting Mimicry of User Actions in Online Social Media , 2009, 2009 International Conference on Computational Science and Engineering.

[17]  Christos Faloutsos,et al.  Quantifying Reciprocity in Large Weighted Communication Networks , 2012, PAKDD.

[18]  D. Cox Some Statistical Methods Connected with Series of Events , 1955 .

[19]  J. K. Ord,et al.  Handbook of the Poisson Distribution , 1967 .

[20]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[21]  Michalis Faloutsos,et al.  A nonstationary Poisson view of Internet traffic , 2004, IEEE INFOCOM 2004.

[22]  Maya Paczuski,et al.  Correlated dynamics in human printing behavior , 2004, ArXiv.

[23]  M. I. Ahmad,et al.  Log-logistic flood frequency analysis , 1988 .

[24]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[25]  Christos Faloutsos,et al.  Human Dynamics in Large Communication Networks , 2011, SDM.

[26]  Christos Faloutsos,et al.  The self-feeding process: a unifying model for communication dynamics in the web , 2013, WWW.

[27]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[28]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[29]  Frederick B. Thompson The Dynamics of Information , 1972 .

[30]  Swapna S. Gokhale,et al.  Log-logistic software reliability growth model , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[31]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[32]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[33]  P. Fisk THE GRADUATION OF INCOME DISTRIBUTIONS , 1961 .

[34]  Albert-László Barabási,et al.  Modeling bursts and heavy tails in human dynamics , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Marcin Owczarczuk,et al.  Long memory in patterns of mobile phone usage , 2012 .

[36]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[37]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[38]  Gordon Johnston,et al.  Statistical Models and Methods for Lifetime Data , 2003, Technometrics.

[39]  T. Mahmood Survival of Newly Founded Businesses: A Log-Logistic Model Approach , 2000 .

[40]  Jean-Pierre Eckmann,et al.  Entropy of dialogues creates coherent structures in e-mail traffic. , 2004, Proceedings of the National Academy of Sciences of the United States of America.