Proceedings of the 20 th International Conference on Big Data Analytics and Knowledge Discovery ( DaWaK ) , Regensburg , Germany , 2018 Community Detection in Who-calls-Whom Social Networks

Mobile phone service providers collect large volumes of data all over the globe. Taking into account that significant information is recorded in these datasets, there is a great potential for knowledge discovery. Since the processing pipeline contains several important steps, like data preparation, transformation, knowledge discovery, a holistic approach is required in order to avoid costly ETL operations across different heterogeneous systems. In this work, we present a design and implementation of knowledge discovery from CDR mobile phone data, using the Apache Spark distributed engine. We focus on the community detection problem which is extremely challenging and it has many practical applications. We have used Apache Spark with the Louvain community detection algorithm using a cluster of machines, to study the scalability and efficiency of the proposed methodology. The experimental evaluation is based on real-world mobile phone data.

[1]  Mirco Musolesi,et al.  Disease Containment Strategies based on Mobility and Information Dissemination , 2015, Scientific Reports.

[2]  Xin Lu,et al.  Detecting climate adaptation with mobile network data in Bangladesh: anomalies in communication, mobility and consumption patterns during cyclone Mahasen , 2016, Climatic Change.

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[5]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[6]  Alex Pentland,et al.  Fair, Transparent, and Accountable Algorithmic Decision-making Processes , 2017, Philosophy & Technology.

[7]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[8]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[9]  Xin Lu,et al.  Mapping poverty using mobile phone and satellite data , 2017, Journal of The Royal Society Interface.

[10]  O. Järv,et al.  Mobile Phones in a Traffic Flow: A Geographical Perspective to Evening Rush Hour Traffic Analysis Using Call Detail Records , 2012, PloS one.

[11]  David Pastor-Escuredo,et al.  Flooding through the lens of mobile phone activity , 2014, IEEE Global Humanitarian Technology Conference (GHTC 2014).

[12]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[13]  Lenka Pitonakova,et al.  Rapid and Near Real-Time Assessments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake , 2016, PLoS currents.

[14]  Marco De Nadai,et al.  A multi-source dataset of urban life in the city of Milan and the Province of Trentino , 2015, Scientific Data.

[15]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[16]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[17]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[18]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Laura Ferrari,et al.  Urban Sensing Using Mobile Phone Network Data: A Survey of Research , 2014, ACM Comput. Surv..

[20]  Ramón Cáceres,et al.  A Tale of One City: Using Cellular Network Data for Urban Planning , 2011, IEEE Pervasive Computing.

[21]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[22]  Marián Boguñá,et al.  Extracting the multiscale backbone of complex weighted networks , 2009, Proceedings of the National Academy of Sciences.

[23]  Marta C. González,et al.  Origin-destination trips by purpose and time of day inferred from mobile phone data , 2015 .

[24]  Truica Ciprian-Octavian,et al.  Comparing Different Term Weighting Schemas for Topic Modeling , 2016 .

[25]  Dubravko Culibrk,et al.  Unveiling Spatial Epidemiology of HIV with Mobile Phone Data , 2015, Scientific Reports.

[26]  Vincent D. Blondel,et al.  A survey of results on mobile phone datasets analysis , 2015, EPJ Data Science.

[27]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[28]  Song Gao,et al.  Discovering Spatial Interaction Communities from Mobile Phone Data , 2013 .

[29]  Kenth Engø-Monsen,et al.  Impact of human mobility on the emergence of dengue epidemics in Pakistan , 2015, Proceedings of the National Academy of Sciences.

[30]  Manuel A. R. Ferreira,et al.  Genome-Wide Association Studies of Asthma in Population-Based Cohorts Confirm Known and Suggested Loci and Identify an Additional Association near HLA , 2012, PloS one.

[31]  Alex Pentland,et al.  Energy consumption prediction using people dynamics derived from cellular network data , 2016, EPJ Data Science.

[32]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[33]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[34]  Jean-Loup Guillaume,et al.  Static community detection algorithms for evolving networks , 2010, 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks.

[35]  Zbigniew Smoreda,et al.  Using big data to study the link between human mobility and socio-economic development , 2015, 2015 IEEE International Conference on Big Data (Big Data).