Disaggregating County-Level Census Data for Population Mapping Using Residential Geo-Objects With Multisource Geo-Spatial Data

Accurate spatialization of socioeconomic data is conducive to understand the spatial and temporal distribution of human social development status and, thus, effectively support future scientific decision-making. This study focuses on population mapping, which is a classical spatialization of macroeconomic data of the social economy. Traditional population mapping based on rough grids or administrative divisions such as townships often has deficiencies in the accuracy of spatial pattern and prediction. In this article, hence, we employ residential geo-objects as basic mapping units and formalize the problem as a spatial prediction process using machine-learning (ML) methods with high-spatial-resolution (HSR) satellite remote sensing images and multisource geospatial data. The indicators of population spatial density, including residential geo-objects’ area, building existence index, terrain slope, night light intensity, density of point of interest (POI) and road network from Internet electronic maps, and locational factors such as the distances from road and river, are jointly applied to establish the relationship between these multivariable factors and quantitative index of population density using ML algorithms such as Random Forests and XGBoost. The predicated values of population density from the mined nonlinear regression relation are further used to calculate the weights of disaggregation of each unit, and then the population quantity distribution at the scale of residential geo-objects is obtained under the control of the total amount of population statistics. Experiments with a county area show that the methodology has the ability to achieve better results than the traditional deterministic methods by reproducing a more accurate and finer geographic population distribution pattern. Meanwhile, it is found that the optimization of mapping results may benefit from the multisources geospatial data, and thus the methodological framework can be recommended to be extended to other spatialization areas of socioeconomic data.

[1]  J. E. Dobson,et al.  LandScan: A Global Population Database for Estimating Populations at Risk , 2000 .

[2]  M. Herold,et al.  Population Density and Image Texture: A Comparison Study , 2006 .

[3]  Andrea E. Gaughan,et al.  Dasymetric modeling: A hybrid approach using land cover and tax parcel data for mapping population in Alachua County, Florida , 2016 .

[4]  Dong Jiang,et al.  An Updating System for the Gridded Population Database of China Based on Remote Sensing, GIS and Spatial Database Technologies , 2009, Sensors.

[5]  C. P. Lo Modeling the population of China using DMSP operational linescan system nighttime data , 2001 .

[6]  Yang Xiao,et al.  STUDY ON SPATIAL DISTRIBUTION OF POPULATION BASED ON REMOTE SENSING AND GIS , 2002 .

[7]  Jordan Graesser,et al.  Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data , 2013 .

[8]  Peter A. Rogerson,et al.  Assessing fine-spatial-resolution remote sensing for small-area population estimation , 2010 .

[9]  Jiancheng Luo,et al.  Geo-Object-Based Soil Organic Matter Mapping Using Machine Learning Algorithms With Multi-Source Geo-Spatial Data , 2019, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[10]  Warren C. Jochem,et al.  Spatially disaggregated population estimates in the absence of national population and housing census data , 2018, Proceedings of the National Academy of Sciences.

[11]  J. Dubois,et al.  Evaluation Of The Grey-level Co-occurrence Matrix Method For Land-cover Classification Using Spot Imagery , 1990 .

[12]  Martino Pesaresi,et al.  A Robust Built-Up Area Presence Index by Anisotropic Rotation-Invariant Textural Measure , 2008, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[13]  Martino Pesaresi,et al.  Improved Textural Built-Up Presence Index for Automatic Recognition of Human Settlements in Arid Regions With Scattered Vegetation , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Feng Liu,et al.  Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem , 2016 .

[15]  Jianping Wu,et al.  Evaluating the Ability of NPP-VIIRS Nighttime Light Data to Estimate the Gross Domestic Product and the Electric Power Consumption of China at Multiple Scales: A Comparison with DMSP-OLS Data , 2014, Remote. Sens..

[16]  Ferhat Bolat,et al.  Comparison of different interpolation methods for spatial distribution of soil organic carbon and some soil properties in the Black Sea backward region of Turkey , 2017 .

[17]  Mitchel Langford,et al.  Rapid facilitation of dasymetric-based population interpolation by means of raster pixel maps , 2007, Comput. Environ. Urban Syst..

[18]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[19]  Liangpei Zhang,et al.  A pixel shape index coupled with spectral information for classification of high spatial resolution remotely sensed imagery , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Maribeth Price,et al.  Impact of reference datasets and autocorrelation on classification accuracy , 2011 .

[21]  Forrest R. Stevens,et al.  Improving Large Area Population Mapping Using Geotweet Densities , 2016, Trans. GIS.

[22]  Z. Shen,et al.  Prior Knowledge-Based Automatic Object-Oriented Hierarchical Classification for Updating Detailed Land Cover Maps , 2015, Journal of the Indian Society of Remote Sensing.

[23]  Filip Biljecki,et al.  Population Estimation Using a 3D City Model: A Multi-Scale Country-Wide Study in the Netherlands , 2016, PloS one.

[24]  C. Webster Population and dwelling unit estimates from space. , 1996, Third world planning review.

[25]  Peter M. Atkinson,et al.  Estimating the spatial distribution of the population of Riyadh, Saudi Arabia using remotely sensed built land cover and height data , 2013, Comput. Environ. Urban Syst..

[26]  P. Sutton Modeling population density with night-time satellite imagery and GIS , 1997 .

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Xiao-Dong Hu,et al.  Geo-parcel-based geographical thematic mapping using C5.0 decision tree: a case study of evaluating sugarcane planting suitability , 2018, Earth Science Informatics.

[29]  Jie Shan,et al.  Building population mapping with aerial imagery and GIS data , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[30]  J. Harvey POPULATION ESTIMATION MODELS BASED ON INDIVIDUAL TM PIXELS , 2002 .

[31]  A. Brenning Spatial prediction models for landslide hazards: review, comparison and evaluation , 2005 .

[32]  Marvin N. Wright,et al.  SoilGrids250m: Global gridded soil information based on machine learning , 2017, PloS one.

[33]  K. Seto,et al.  Mapping urbanization dynamics at regional and global scales using multi-temporal DMSP/OLS nighttime light data , 2011 .

[34]  John S. Gulliver,et al.  Dasymetric modelling of small-area population distribution using land cover and light emissions data , 2007 .

[35]  D. Martin,et al.  Mapping population data from zone centroid locations. , 1989, Transactions.

[36]  C. Elvidge,et al.  Nighttime Lights Compositing Using the VIIRS Day-Night Band: Preliminary Results , 2013 .

[37]  Jukka Heikkonen,et al.  Estimating the prediction performance of spatial models via spatial k-fold cross validation , 2017, Int. J. Geogr. Inf. Sci..

[38]  Yuji Murayama,et al.  A GIS Approach to Estimation of Building Population for Micro‐spatial Analysis , 2009, Trans. GIS.

[39]  Bo Huang,et al.  Support Vector Regression-Based Downscaling for Intercalibration of Multiresolution Satellite Images , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[40]  S I Hay,et al.  Determining global population distribution: methods, applications and data. , 2006, Advances in parasitology.

[41]  T. Sauer,et al.  Spatial Variation of Plant-Available Phosphorus in Pastures with Contrasting Management , 2003 .

[42]  D. Bui,et al.  A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. , 2015 .

[43]  C. Lo,et al.  Dasymetric Estimation of Population Density and Areal Interpolation of Census Data , 2004 .

[44]  Tomasz F. Stepinski,et al.  High resolution dasymetric model of U.S demographics with application to spatial distribution of racial diversity , 2014 .

[45]  Yu Liu,et al.  Towards Estimating Urban Population Distributions from Mobile Call Data , 2012 .

[46]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[47]  Xiaolu Gao,et al.  Modeling the spatial distribution of urban population during the daytime and at night based on land use: A case study in Beijing, China , 2015, Journal of Geographical Sciences.

[48]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[49]  F. J. Gallego,et al.  A population density grid of the European Union , 2010 .

[50]  Gertrud Schaab,et al.  Human population distribution modelling at regional level using very high resolution satellite imagery , 2013 .

[51]  Catherine Linard,et al.  Spatiotemporal patterns of population in mainland China, 1990 to 2010 , 2016, Scientific Data.

[52]  D. Wayne Mooneyhan,et al.  Global resource information database , 1993 .

[53]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[54]  Dar A. Roberts,et al.  A Comparison of Nighttime Satellite Imagery and Population Density for the Continental United States , 1997 .

[55]  P. Dong,et al.  Evaluation of small-area population estimation using LiDAR, Landsat TM and parcel data , 2010 .

[56]  Alexander Zipf,et al.  Fine-resolution population mapping using OpenStreetMap points-of-interest , 2014, Int. J. Geogr. Inf. Sci..

[57]  Shouzhi Xu,et al.  GRIDDED POPULATION DISTRIBUTION MAP FOR THE HEBEI PROVINCE OF CHINA , 2015 .

[58]  E. Fegraus,et al.  Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning , 2017, Nutrient Cycling in Agroecosystems.

[59]  Xi Li,et al.  Potential of NPP-VIIRS Nighttime Light Imagery for Modeling the Regional Economy of China , 2013, Remote. Sens..

[60]  G. Heuvelink,et al.  Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions , 2015, PloS one.

[61]  Alexander Brenning,et al.  Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[62]  C. Lo Automated population and dwelling unit estimation from high-resolution satellite images: a GIS approach , 1995 .

[63]  J. Mennis Generating Surface Models of Population Using Dasymetric Mapping , 2003, The Professional Geographer.

[64]  Zhaoxin Dai,et al.  The Suitability of Different Nighttime Light Data for GDP Estimation at Different Spatial Scales and Regional Levels , 2017 .

[65]  B. Bhaduri,et al.  LandScan USA: a high-resolution geospatial and temporal modeling approach for population distribution and dynamics , 2007 .

[66]  Bor-Wen Tsai,et al.  Multi-layer multi-class dasymetric mapping to estimate population distribution. , 2010, The Science of the total environment.

[67]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[68]  Leila Maria Garcia Fonseca,et al.  Urban population estimation based on residential buildings volume using IKONOS-2 images and lidar data , 2016 .

[69]  C. Elvidge,et al.  Spatial analysis of global urban extent from DMSP-OLS night lights , 2005 .

[70]  Uncertainty quantification of interpolated maps derived from observations with different accuracy levels , 2016 .