Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia

Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals.

[1]  Forrest R. Stevens,et al.  GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data , 2017, International Journal of Health Geographics.

[2]  Tomas J. Bird,et al.  Fine resolution mapping of population age-structures for health and development applications , 2015, Journal of The Royal Society Interface.

[3]  Joshua L. Warren,et al.  Influence of Demographic and Health Survey Point Displacements on Raster-Based Analyses , 2015, Spatial Demography.

[4]  Charles M. Macal,et al.  Everything you need to know about agent-based modelling and simulation , 2016, J. Simulation.

[5]  Robert Tanton,et al.  A Review of Spatial Microsimulation Methods , 2013 .

[6]  Andrew J. Tatem,et al.  High resolution age-structured mapping of childhood vaccination coverage in low and middle income countries , 2018, Vaccine.

[7]  Kytt MacManus,et al.  Taking Advantage of the Improved Availability of Census Data: A First Look at the Gridded Population of the World, Version 4 , 2015 .

[8]  Alexander Kowarik,et al.  Simulation of Synthetic Complex Data: The R Package simPop , 2017 .

[9]  S. Bilgin,et al.  The distribution and biomass of catchable fish caught by commercial bottom trawl in the Black Sea (Sinop-İnceburun region). , 2010 .

[10]  Martin Clarke,et al.  The Generation of Individual and Household Incomes at the Small Area Level using Synthesis , 1989 .

[11]  Karen C. Seto,et al.  A Robust Method to Generate a Consistent Time Series From DMSP/OLS Nighttime Light Data , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Kenneth Hill,et al.  Slum Residence and Child Health in Developing Countries , 2014, Demography.

[13]  Andrew J. Tatem,et al.  Identifying residential neighbourhood types from settlement points in a machine learning approach , 2018, Comput. Environ. Urban Syst..

[14]  Karyn Morrissey,et al.  Creating a Spatial Microsimulation Model of the Irish Local Economy , 2012 .

[15]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[16]  A. Tatem,et al.  The accuracy of human population maps for public health application , 2005, Tropical medicine & international health : TM & IH.

[17]  Catherine Linard,et al.  Examining the correlates and drivers of human population distributions across low- and middle-income countries , 2017, Journal of The Royal Society Interface.

[18]  Patrick Taillandier,et al.  Gen*: a generic toolkit to generate spatially explicit synthetic populations , 2018, Int. J. Geogr. Inf. Sci..

[19]  Anil M. Cheriyadat,et al.  Image Based Characterization of Formal and Informal Neighborhoods in an Urban Landscape , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[20]  Matthias Templ,et al.  Disclosure Risk of Synthetic Population Data with Application in the Case of EU-SILC , 2010, Privacy in Statistical Databases.

[21]  Andrew J. Tatem,et al.  WorldPop, open data for spatial demography , 2017, Scientific Data.

[22]  H. Elsey,et al.  Addressing Inequities in Urban Health: Do Decision-Makers Have the Data They Need? Report from the Urban Health Data Special Session at International Conference on Urban Health Dhaka 2015 , 2016, Journal of Urban Health.

[23]  Peter Filzmoser,et al.  Simulation of close-to-reality population data for household surveys with application to EU-SILC , 2011, Stat. Methods Appl..

[24]  Martin Clarke,et al.  Synthesis—A Synthetic Spatial Information System for Urban and Regional Analysis: Methods and Examples , 1988 .

[25]  Soille Pierre,et al.  Global Human Settlement Layer , 2016 .

[26]  A. Tatem,et al.  Assessing the accuracy of satellite derived global and national urban maps in Kenya. , 2005, Remote sensing of environment.

[27]  Richard Kingston,et al.  Building a Spatial Microsimulation-Based Planning Support System for Local Policy Making , 2007 .

[28]  Thomas Esch,et al.  Urban Footprint Processor—Fully Automated Processing Chain Generating Settlement Masks From Global Data of the TanDEM-X Mission , 2013, IEEE Geoscience and Remote Sensing Letters.

[29]  Tomas J. Bird,et al.  Exploring the high-resolution mapping of gender-disaggregated development indicators , 2017, Journal of The Royal Society Interface.

[30]  Alison J. Heppenstall,et al.  "Space, the Final Frontier": How Good are Agent-Based Models at Simulating Individuals and Space in Cities? , 2016, Syst..

[31]  R. Engstrom,et al.  Spatial refinement of census population distribution using remotely sensed estimates of impervious surfaces in Haiti , 2010 .

[32]  Maosheng Zhao,et al.  A Continuous Satellite-Derived Measure of Global Terrestrial Primary Production , 2004 .

[33]  Alessandro Sorichetta,et al.  High resolution global gridded data for use in population studies , 2017, Scientific Data.