Behavioral Insights for Development from Mobile Network Big Data: Enlightening Policy Makers on the State of the Art

The discipline of Information and Communication Technologies for Development (ICT4D) gained traction against the exponential growth in mobile phone connectivity. There has been a multitude of projects, services, applications and even policies that aim to leverage the mobile phone to contribute to the broader development of society. This has gone hand in hand with much academic interest in understanding the effects of mobile phone connectivity on development. However it is only of late that attention is being paid to posing development related questions to the basic data artifacts that are left behind by society when consuming mobile phone services. These artifacts come under the class of Transaction Generated Data (TGD) having been recorded by mobile phone operators when certain events (for e.g. when one makes a call) occur for the purposes of billing and network optimization. Given the volumes of TGD that is produced it also falls under the category of Big Data. Big data is an amorphous category that could, for instance, include data from an astronomical observatory or the full text of all the digitized books from the 20th century. Like many others, the 2011 McKinsey Global Institute report on Big Data focuses solely on the "big" in defining the term: "Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze" (Manyika et al., 2011). This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data with the implicit assumption that as technology advances over time, the size of datasets that qualify as big data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes). Gartner (2011) introduced additional important definitional characteristics in addition to volume, namely velocity and variety. Velocity refers to the speed at which data is generated, assessed and analyzed. The term "Variety" encompasses the fact that data can exist as different media (text, audio, video) and come in different format (structured and unstructured). Value is a fourth definitional characteristic that acknowledges the potential high socio-economic value that may be generated by Big Data (Jones, 2012). Included within its scope is the category of transaction-generated data (TGD), also sometimes described as "data exhaust." This category was first discussed in 1991, though the term then used was transaction-generated information. The value of this subset of big data is that it is directly connected to human behavior and its accuracy is generally high because the data is generated for a purpose, such as the completion of telephone call or a commercial transaction.TGD has great potential for broader development and is already being leveraged to predict flu trends, forecast unemployment, understand societal ties and overall socio-economic well-being, etc.However unlike in developed countries, the only streams of comprehensive big data with wide socio-economic coverage in developing countries are those generated by telecom networks, because commercial banks and supermarkets, for example, do not reach a majority of people. Even whilst internet access is growing fast in developing economies, as noted in the 2013 Measuring the Information Society report by ITU, overall household internet penetration in developing economies was expected to be only 28% as of end 2013, as opposed to almost 80% in developed economies. Basic mobile subscriptions however have almost peaked at 96% globally (ITU, 2013). Therefore in the near term, it is non-Internet related mobile network big data that has the widest socioeconomic coverage. Such data is already being utilized for development and monitoring not just in developed economies but also in developing economies. Therefore the focus of this paper is mainly on mobile network big data for development.This policy paper serves to enlighten policy makers in developing economies, as to the range of behavioral insights on mobility, connectivity and consumption that can be extracted from mobile network TGD. Importantly, this paper also addresses how these insights can be leveraged by multiple policy domains inter alia transport, health, and economic development.

[1]  Vincent D. Blondel,et al.  Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets , 2013, ArXiv.

[2]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[3]  Eng Yeow Cheu,et al.  Studying Intercity Travels and Traffic Using Cellular Network Data , 2013 .

[4]  Carlo Ratti,et al.  Transportation mode inference from anonymized and aggregated mobile phone call detail records , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[5]  Xueliang Li,et al.  On a Relation Between , 2012 .

[6]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[7]  J. Blumenstock,et al.  Divided We Call: Disparities in Access and Use of Mobile Phones in Rwanda , 2012 .

[8]  Joshua Evan Blumenstock Using mobile phone data to measure the ties between nations , 2011, iConference '11.

[9]  Vanessa Frías-Martínez,et al.  On the relation between socio-economic status and physical mobility , 2012, Inf. Technol. Dev..

[10]  Sougata Mukherjea,et al.  Social ties and their relevance to churn in mobile telecom networks , 2008, EDBT '08.

[11]  David L. Smith,et al.  Quantifying the Impact of Human Mobility on Malaria , 2012, Science.

[12]  Lisa Amini,et al.  Challenges and results in city-scale sensing , 2011, 2011 IEEE SENSORS Proceedings.

[13]  R. Ahas,et al.  Location based services—new challenges for planning and public administration? , 2005 .

[14]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[15]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[16]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[17]  Petter Holme,et al.  Predictability of population displacement after the 2010 Haiti earthquake , 2012, Proceedings of the National Academy of Sciences.

[18]  Marco Luca Sbodio,et al.  AllAboard: A System for Exploring Urban Mobility and Optimizing Public Transport Using Cellphone Data , 2013, ECML/PKDD.

[19]  L. Bengtsson,et al.  Improved Response to Disasters and Outbreaks by Tracking Population Movements with Mobile Phone Network Data: A Post-Earthquake Geospatial Study in Haiti , 2011, PLoS medicine.

[20]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[21]  Sigon Kim,et al.  ORIGIN-DESTINATION ESTIMATION USING CELLULAR PHONE BS INFORMATION , 2005 .

[22]  Margaret Martonosi,et al.  Identifying Important Places in People's Lives from Cellular Network Data , 2011, Pervasive.

[23]  Johan Wideberg,et al.  Deriving origin destination data from a mobile phone network , 2007 .

[24]  Joshua E. Blumenstock,et al.  Information Technology for Development Inferring Patterns of Internal Migration from Mobile Phone Call Records: Evidence from Rwanda Inferring Patterns of Internal Migration from Mobile Phone Call Records: Evidence from Rwanda , 2022 .

[25]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[26]  Vanessa Frías-Martínez,et al.  On the relationship between socio-economic factors and cell phone usage , 2012, ICTD.

[27]  Kabir Kumar,et al.  Can digital footprints lead to greater financial inclusion , 2012 .

[28]  Liang Liu,et al.  Estimating Origin-Destination Flows Using Mobile Phone Location Data , 2011, IEEE Pervasive Computing.

[29]  Judith Bayard Cushing,et al.  Beyond Big Data? , 2013, Comput. Sci. Eng..

[30]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[31]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[32]  Jukka-Pekka Onnela,et al.  Geographic Constraints on Social Network Groups , 2010, PloS one.

[33]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[34]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[35]  P. Olivier,et al.  Socio-Geography of Human Mobility: A Study Using Longitudinal Mobile Phone Data , 2012, PloS one.

[36]  S. Strogatz,et al.  Redrawing the Map of Great Britain from a Network of Human Interactions , 2010, PloS one.

[37]  Jonathan Magnusson,et al.  Subscriber classification within telecom networks utilizing big data technologies and machine learning , 2012, BigMine '12.