Combining disparate data sources for improved poverty prediction and mapping

Significance Spatially finest poverty maps are essential for improved diagnosis and policy planning, especially keeping in view the Sustainable Development Goals. “Big Data” sources like call data records and satellite imagery have shown promise in providing intercensal statistics. This study outlines a computational framework to efficiently combine disparate data sources, like environmental data, and mobile data, to provide more accurate predictions of poverty and its individual dimensions for finest spatial microregions in Senegal. These are validated using the concurrent census data. More than 330 million people are still living in extreme poverty in Africa. Timely, accurate, and spatially fine-grained baseline data are essential to determining policy in favor of reducing poverty. The potential of “Big Data” to estimate socioeconomic factors in Africa has been proven. However, most current studies are limited to using a single data source. We propose a computational framework to accurately predict the Global Multidimensional Poverty Index (MPI) at a finest spatial granularity and coverage of 552 communes in Senegal using environmental data (related to food security, economic activity, and accessibility to facilities) and call data records (capturing individualistic, spatial, and temporal aspects of people). Our framework is based on Gaussian Process regression, a Bayesian learning technique, providing uncertainty associated with predictions. We perform model selection using elastic net regularization to prevent overfitting. Our results empirically prove the superior accuracy when using disparate data (Pearson correlation of 0.91). Our approach is used to accurately predict important dimensions of poverty: health, education, and standard of living (Pearson correlation of 0.84–0.86). All predictions are validated using deprivations calculated from census. Our approach can be used to generate poverty maps frequently, and its diagnostic nature is, likely, to assist policy makers in designing better interventions for poverty eradication.

[1]  J. Rockström,et al.  Water, nutrients and slope position in on-farm pearl millet cultivation in the Sahel , 1997, Plant and Soil.

[2]  Morten Jerven,et al.  Comparability of GDP Estimates in Sub‐Saharan Africa: The Effect of Revisions in Sources and Methods Since Structural Adjustment , 2013 .

[3]  Uwe Deichmann,et al.  World development report 2016: Digital dividends , 2016 .

[4]  Patrick E. McSharry,et al.  Constructing spatiotemporal poverty indices from big data , 2017 .

[5]  L. Capra,et al.  Ubiquitous Sensing for Mapping Poverty in Developing Countries , 2013 .

[6]  Alex Pentland,et al.  Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data , 2014, ICMI.

[7]  Venu Govindaraju,et al.  Virtual Networks and Poverty Analysis in Senegal , 2015, ArXiv.

[8]  B. McGill,et al.  Testing the predictive performance of distribution models , 2013 .

[9]  Alex Pentland,et al.  Predicting Personality Using Novel Mobile Phone-Based Metrics , 2013, SBP.

[10]  Maria Emma Santos,et al.  Measuring Acute Poverty in the Developing World: Robustness and Scope of the Multidimensional Poverty Index , 2013 .

[11]  P. Defourny,et al.  Accuracy Assessment of a 300 m Global Land Cover Map : The GlobCover Experience , 2009 .

[12]  J. Lanjouw,et al.  Micro-Level Estimation of Poverty and Inequality , 2003 .

[13]  Sabina Alkire and Emma Samman Mobilising the Household Data Required to Progress toward the SDGs , 2014 .

[14]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[15]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[16]  Upali A. Amarasinghe,et al.  Spatial clustering of rural poverty and food insecurity in Sri Lanka , 2005 .

[17]  D. Jacques,et al.  Genesis of millet prices in Senegal: the role of production, markets and their failures , 2014 .

[18]  Norbert Henninger,et al.  Spatial determinants of poverty in rural Kenya , 2007, Proceedings of the National Academy of Sciences.

[19]  Isabel Molina,et al.  Small Area Estimation: Rao/Small Area Estimation , 2005 .

[20]  Danny Pfeffermann,et al.  Small Area Estimation , 2011, International Encyclopedia of Statistical Science.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Y. Murayama,et al.  Spatial Determinants of Poverty Using GIS-Based Mapping , 2011 .

[23]  Kwawu Mensan Gaba,et al.  Detection of rural electrification in Africa using DMSP-OLS night lights imagery , 2013 .

[24]  N. Eagle,et al.  Network Diversity and Economic Development , 2010, Science.

[25]  L. Hunter,et al.  Rural Household Demographics, Livelihoods and the Environment. , 2008, Global environmental change : human and policy dimensions.

[26]  Jordan Chamberlin,et al.  An investigation of the spatial determinants of the local prevalence of poverty in rural Malawi , 2005 .

[27]  Timothy P. Robinson,et al.  A Living from Livestock Pro-Poor Livestock Policy Initiative Poverty Mapping in Uganda : An Analysis Using Remotely Sensed and Other Environmental Data , 2006 .

[28]  Steffen Fritz,et al.  Mapping Priorities to Focus Cropland Mapping Activities: Fitness Assessment of Existing Global, Regional and National Cropland Maps , 2015, Remote. Sens..

[29]  David Wheeler,et al.  Where is the Poverty-Environment Nexus? Evidence from Cambodia, Lao PDR, and Vietnam , 2005 .

[30]  M. Kropff,et al.  Effects of cultivation practices on spatial variation of soil fertility and millet yields in the Sahel of Mali , 2005 .

[31]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32]  L. Christiaensen,et al.  Poverty in a Rising Africa , 2016 .

[33]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[34]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[35]  Peter M. Atkinson,et al.  Understanding the Evidence Base for Poverty–Environment Relationships using Remotely Sensed Satellite Data: An Example from Assam, India , 2016 .

[36]  Alex 'Sandy' Pentland,et al.  bandicoot: a Python Toolbox for Mobile Phone Metadata , 2016, J. Mach. Learn. Res..

[37]  H. White,et al.  On More Robust Estimation of Skewness and Kurtosis: Simulation and Application to the S&P500 Index , 2003 .

[38]  A. Bebbington,et al.  Conceptualizing Spatial Diversity in Latin American Rural Development: Structures, Institutions, and Coalitions , 2015 .

[39]  N. Cressie The origins of kriging , 1990 .

[40]  Nils B. Weidmann,et al.  Using night light emissions for the prediction of local wealth , 2017 .

[41]  Víctor Soto,et al.  Prediction of socioeconomic levels using cell phone records , 2011, UMAP'11.

[42]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[43]  Mahabub Hossain,et al.  Spatial patterns of rural poverty and their relationship with welfare-influencing factors in Bangladesh , 2005 .

[44]  T. Fearn Ridge Regression , 2013 .

[45]  Shanta Devarajan Africa's Statistical Tragedy , 2013 .

[46]  J. Virseda,et al.  Can Cell Phone Traces Measure Social Development? , 2013 .

[47]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[48]  Pål Sundsøy Can mobile usage predict illiteracy in a developing country? , 2016, ArXiv.

[49]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[50]  S. Alkire,et al.  Counting and Multidimensional Poverty Measurement , 2010 .