Clean up or mess up: the effect of sampling biases on measurements of degree distributions in mobile phone datasets

Mobile phone data have been extensively used in the recent years to study social behavior. However, most of these studies are based on only partial data whose coverage is limited both in space and time. In this paper, we point to an observation that the bias due to the limited coverage in time may have an important influence on the results of the analyses performed. In particular, we observe significant differences, both qualitatively and quantitatively, in the degree distribution of the network, depending on the way the dataset is pre-processed and we present a possible explanation for the emergence of Double Pareto LogNormal (DPLN) degree distributions in temporal data.

[1]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[2]  Margaret Martonosi,et al.  Human mobility modeling at metropolitan scales , 2012, MobiSys '12.

[3]  F. Calabrese,et al.  Urban gravity: a model for inter-city telecommunication flows , 2009, 0905.0692.

[4]  Vincent D. Blondel,et al.  A survey of results on mobile phone datasets analysis , 2015, EPJ Data Science.

[5]  Margaret Martonosi,et al.  DP-WHERE: Differentially private modeling of human mobility , 2013, 2013 IEEE International Conference on Big Data.

[6]  Etienne Huens,et al.  Geographical dispersal of mobile communication networks , 2008, 0802.2178.

[7]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[8]  S. Strogatz,et al.  Redrawing the Map of Great Britain from a Network of Human Interactions , 2010, PloS one.

[9]  Jari Saramäki,et al.  Effects of time window size and placement on the structure of an aggregated communication network , 2012, EPJ Data Science.

[10]  Jari Saramäki,et al.  Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences , 2013, Proceedings of the National Academy of Sciences.

[11]  Balázs Csanád Csáji,et al.  Exploring the Mobility of Mobile Phone Users , 2012, ArXiv.

[12]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[13]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[14]  Jari Saramäki,et al.  Small But Slow World: How Network Topology and Burstiness Slow Down Spreading , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  I. Thomas,et al.  Regions and borders of mobile telephony in Belgium and in the Brussels metropolitan zone , 2010 .

[16]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[17]  Vincent D. Blondel,et al.  Scaling Behaviors in the Communication Network between Cities , 2009, 2009 International Conference on Computational Science and Engineering.

[18]  Christos Faloutsos,et al.  Mobile call graphs: beyond power-law and lognormal distributions , 2008, KDD.

[19]  S. Fortunato,et al.  Scaling and universality in proportional elections. , 2006, Physical review letters.