Scalable and flexible clustering solutions for mobile phone-based population indicators

Mobile phones have an unprecedented rate of penetration across the world. Such devices produce a large amount of data that have been used on different domains. In this work, we make use of mobile calls to monitor the presence of individuals region by region. Traditionally, this activity has been conducted by means of censuses and surveys. Nowadays, technologies open new possibilities to analyse the individual calling behaviour to determine the amount of residents, commuters and visitors moving in an area. To this end, in this paper we provide a clustering technique completely unsupervised able to cluster data by exploring an arbitrary similarity metric. We make use of such technique, and we define metric to analyse mobile calls and individual profiles. The approach provides better population estimation with respect to state of the art when results are compared with real census data and greatly improves the execution time of a previous work of some of the authors of this paper. The scalability and flexibility of the proposed framework enables novel scenarios for the characterization of people by means of data derived from mobile users, ranging from the nearly real-time estimation of presences to the definition of complex, uncommon user archetypes.

[1]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[2]  Dino Pedreschi,et al.  Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach , 2014 .

[3]  Dino Pedreschi,et al.  2015 Ieee International Conference on Big Data (big Data) City Users' Classification with Mobile Phone Data , 2022 .

[4]  Carlo Ratti,et al.  Mobile Landscapes: Using Location Data from Cell Phones for Urban Analysis , 2006 .

[5]  Laura Ricci,et al.  Fast Connected Components Computation in Large Graphs by Vertex Pruning , 2017, IEEE Transactions on Parallel and Distributed Systems.

[6]  Matteo Dell'Amico,et al.  NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data , 2016, Proc. VLDB Endow..

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  Masayuki Terada,et al.  Population Estimation Technology for Mobile Spatial Statistics , 2013 .

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  Carlo Ratti,et al.  Real-Time Urban Monitoring Using Cell Phones: A Case Study in Rome , 2011, IEEE Transactions on Intelligent Transportation Systems.

[11]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[12]  O. Järv,et al.  Using Mobile Positioning Data to Model Locations Meaningful to Users of Mobile Phones , 2010 .

[13]  Vincent Etter,et al.  Where to go from here? Mobility prediction from instantaneous information , 2013, Pervasive Mob. Comput..

[14]  Thibault Debatty,et al.  Scalable k-NN based text clustering , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[15]  Laura Ricci,et al.  Cracker: Crumbling large graphs into connected components , 2015, 2015 IEEE Symposium on Computers and Communication (ISCC).

[16]  Carlo Ratti,et al.  Mobile Landscapes: Graz in Real Time , 2007, Location Based Services and TeleCartography.

[17]  Xiaoyang Yu,et al.  Mining community and inferring friendship in mobile social networks , 2016, Neurocomputing.

[18]  Bernard Desgraupes Clustering Indices , 2016 .

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  Patrizio Dazzi,et al.  Improving population estimation from mobile calls: A clustering approach , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[21]  Di Ma,et al.  MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.