Estimation of Poverty Using Random Forest Regression with Multi-Source Data: A Case Study in Bangladesh

Spatially explicit and reliable data on poverty is critical for both policy makers and researchers. However, such data remain scarce particularly in developing countries. Current research is limited in using environmental data from different sources in isolation to estimate poverty despite the fact that poverty is a complex phenomenon which cannot be quantified either theoretically or practically by one single data type. This study proposes a random forest regression (RFR) model to estimate poverty at 10 km × 10 km spatial resolution by combining features extracted from multiple data sources, including the National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) Day/Night Band (DNB) nighttime light (NTL) data, Google satellite imagery, land cover map, road map and division headquarter location data. The household wealth index (WI) drawn from the Demographic and Health Surveys (DHS) program was used to reflect poverty level. We trained the RFR model using data in Bangladesh and applied the model to both Bangladesh and Nepal to evaluate the model's accuracy. The results show that the R between the actual and estimated WI in Bangladesh is 0.70, indicating a good predictive power of our model in WI estimation. The R between actual and estimated WI of 0.61 in Nepal also indicates a good generalization ability of the model. Furthermore, a negative correlation is observed between the district average WI and the poverty head count ratio (HCR) in Bangladesh with the Pearson Correlation Coefficient of -0.6. Using Gini importance, we identify that proximity to urban areas is the most important variable to explain poverty which contribute to 37.9% of the explanatory power. Compared to the study that used NTL and Google satellite imagery in isolation to estimate poverty, our method increases the accuracy of estimation. Given that the data we use are globally and publicly available, the methodology reported in this study would also be applicable in other countries or regions to estimate the extent of poverty.

[1]  Elfatih M. Abdel-Rahman,et al.  Random forest regression and spectral band selection for estimating sugarcane leaf nitrogen concentration using EO-1 Hyperion hyperspectral data , 2013 .

[2]  Brian L. Spatocco,et al.  Targeting Villages for Rural Development Using Satellite Image Analysis , 2015, Big Data.

[3]  A. Getis The Analysis of Spatial Association by Use of Distance Statistics , 2010 .

[4]  Guofeng Cao,et al.  Forecasting China’s GDP at the pixel level using nighttime lights time series and population images , 2017 .

[5]  A. Tatem,et al.  Using remotely sensed night-time light as a proxy for poverty in Africa , 2008, Population health metrics.

[6]  Juan Carlos Duque,et al.  Measuring intra-urban poverty using land cover and texture metrics derived from remote sensing data , 2015 .

[7]  T. Robinson,et al.  Sustainable Development Goals , 2016 .

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  Trevor Hastie,et al.  Tree-Based Methods , 2021, Springer Texts in Statistics.

[10]  Wei Song,et al.  A New Approach for Detecting Urban Centers and Their Spatial Structure With Nighttime Light Remote Sensing , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[12]  Yun Chen,et al.  Modeling and mapping total freight traffic in China using NPP-VIIRS nighttime light composite data , 2015 .

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  Xinkai Zhu,et al.  Estimation of biomass in wheat using random forest regression algorithm and remote sensing data , 2016 .

[15]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[16]  Binayak Sen,et al.  Drivers of Escape and Descent : Changing Household Fortunes in Rural Bangladesh BINAYAK SEN , 2003 .

[17]  S. Carvalho Combining the Quantitative and Qualitative Approaches to Poverty Measurement and Analysis: The Practice and the Potential , 1997 .

[18]  Tim Appelhans,et al.  Improving the accuracy of rainfall rates from optical satellite sensors with machine learning — A random forests-based approach applied to MSG SEVIRI , 2014 .

[19]  Budhendra L. Bhaduri,et al.  A global poverty map derived from satellite data , 2009, Comput. Geosci..

[20]  Irenius Joseph. Ruyobya Poverty measurement and analysis using non-monetary approach : the case of Tanzania. , 2006 .

[21]  R. Stott,et al.  The World Bank , 2008, Annals of tropical medicine and parasitology.

[22]  Chenghu Zhou,et al.  Nighttime Light Derived Assessment of Regional Inequality of Socioeconomic Development in China , 2015, Remote. Sens..

[23]  Chen Peng,et al.  Urban Built-Up Area Extraction From Log- Transformed NPP-VIIRS Nighttime Light Composite Data , 2018, IEEE Geoscience and Remote Sensing Letters.

[24]  K. Battle,et al.  A global map of travel time to cities to assess inequalities in accessibility in 2015 , 2018, Nature.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Cynthia A. Brewer,et al.  Evaluation of Methods for Classifying Epidemiological Data on Choropleth Maps in Series , 2002 .

[27]  Jianping Wu,et al.  Evaluation of NPP-VIIRS night-time light composite data for extracting built-up urban areas , 2014 .

[28]  Abhijit Dasgupta,et al.  Brief review of regression‐based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience , 2011, Genetic epidemiology.

[29]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[30]  C. Elvidge,et al.  Spatial characterization of electrical power consumption patterns over India using temporal DMSP‐OLS night‐time satellite data , 2009 .

[31]  Clement Atzberger,et al.  Tree Species Classification with Random Forest Using Very High Spatial Resolution 8-Band WorldView-2 Satellite Data , 2012, Remote. Sens..

[32]  Bailang Yu,et al.  Exploring spatiotemporal patterns of electric power consumption in countries along the Belt and Road , 2018 .

[33]  Peter M. Atkinson,et al.  Predicting socioeconomic conditions from satellite sensor data in rural developing countries: A case study using female literacy in Assam, India , 2013 .

[34]  Jesko Hentschel,et al.  Измерение И Анализ Бедности [Poverty Measurement and Analysis] , 2002 .

[35]  Yun Chen,et al.  Modeling spatiotemporal CO2 (carbon dioxide) emission dynamics in China from DMSP-OLS nighttime stable light data using panel data analysis , 2016 .

[36]  Wei Song,et al.  Object-based spatial cluster analysis of urban landscape pattern using nighttime light satellite images: a case study of China , 2014, Int. J. Geogr. Inf. Sci..

[37]  Alexis J. Comber,et al.  Random forest classification of salt marsh vegetation habitats using quad-polarimetric airborne SAR, elevation and optical RS data , 2014 .

[38]  Jianping Wu,et al.  Poverty Evaluation Using NPP-VIIRS Nighttime Light Composite Data at the County Level in China , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[39]  S. Rutstein,et al.  The DHS wealth index: Approaches for rural and urban areas , 2008 .

[40]  Bailang Yu,et al.  Detecting spatiotemporal dynamics of global electric power consumption using DMSP-OLS nighttime stable light data , 2016 .

[41]  Jianping Wu,et al.  Estimating House Vacancy Rate in Metropolitan Areas Using NPP-VIIRS Nighttime Light Composite Data , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[42]  Dar A. Roberts,et al.  A Comparison of Nighttime Satellite Imagery and Population Density for the Continental United States , 1997 .

[43]  Usman Qamar,et al.  Introduction to Feature Selection , 2019, Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications.

[44]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[45]  N. Diffenbaugh,et al.  Climate volatility deepens poverty vulnerability in developing countries , 2009 .

[46]  Mariana Belgiu,et al.  Random forest in remote sensing: A review of applications and future directions , 2016 .

[47]  D. Roberts,et al.  Census from Heaven: An estimate of the global human population using night-time satellite imagery , 2001 .

[48]  Ramakrishna R. Nemani,et al.  International Journal of Remote Sensing the Nightsat Mission Concept the Nightsat Mission Concept , 2022 .

[49]  Jianping Wu,et al.  Evaluating the Ability of NPP-VIIRS Nighttime Light Data to Estimate the Gross Domestic Product and the Electric Power Consumption of China at Multiple Scales: A Comparison with DMSP-OLS Data , 2014, Remote. Sens..

[50]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[51]  P. Sutton,et al.  Creating a Global Grid of Distributed Fossil Fuel CO 2 Emissions from Nighttime Satellite Imagery , 2010 .

[52]  Yatao Zhang,et al.  Mapping fine-scale population distributions at the building level by integrating multisource geospatial big data , 2017, Int. J. Geogr. Inf. Sci..

[53]  Brock Smith,et al.  Left in the Dark? Oil and Rural Poverty , 2018, Journal of the Association of Environmental and Resource Economists.

[54]  Qingyuan Yang,et al.  Evaluating spatiotemporal patterns of urban electricity consumption within different spatial boundaries: A case study of Chongqing, China , 2019, Energy.

[55]  C. Elvidge,et al.  VIIRS night-time lights , 2017, Remote Sensing of Night-time Light.

[56]  P. Lerman Fitting Segmented Regression Models by Grid Search , 1980 .

[57]  T. Pei,et al.  Responses of Suomi-NPP VIIRS-derived nighttime lights to socioeconomic activity in China’s cities , 2014 .