GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data

BackgroundHousehold survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.ResultsWe replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.ConclusionsGridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.

[1]  Raymond Lagonigro,et al.  A quadtree approach based on European geographic grids: reconciling data privacy and accuracy , 2017 .

[2]  Tomas J. Bird,et al.  Exploring the high-resolution mapping of gender-disaggregated development indicators , 2017, Journal of The Royal Society Interface.

[3]  Catherine Linard,et al.  High-resolution gridded population datasets for Latin America and the Caribbean in 2010, 2015, and 2020 , 2015, Scientific Data.

[4]  Amber L. Pearson,et al.  Using remote, spatial techniques to select a random household sample in a dispersed, semi-nomadic pastoral community: utility for a longitudinal health and demographic surveillance system , 2015, International Journal of Health Geographics.

[5]  Catherine Linard,et al.  A high resolution spatial population database of Somalia for disease risk mapping , 2010, International journal of health geographics.

[6]  Jordan Graesser,et al.  Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data , 2013 .

[7]  Andrew J Tatem,et al.  Mapping for maternal and newborn health: the distributions of women of childbearing age, pregnancies and births , 2014, International Journal of Health Geographics.

[8]  Xin Lu,et al.  Mapping poverty using mobile phone and satellite data , 2017, Journal of The Royal Society Interface.

[9]  Roy Carr-Hill,et al.  Missing Millions and Measuring Development Progress , 2013 .

[10]  Xin Lu,et al.  Detecting climate adaptation with mobile network data in Bangladesh: anomalies in communication, mobility and consumption patterns during cyclone Mahasen , 2016, Climatic Change.

[11]  K. Lindblade,et al.  A census-weighted, spatially-stratified household sampling strategy for urban malaria epidemiology , 2008, Malaria Journal.

[12]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[13]  A. Tatem,et al.  The effects of spatial population dataset choice on estimates of population at risk of disease , 2011, Population health metrics.

[14]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[15]  Amy Hagopian,et al.  A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq , 2012, International Journal of Health Geographics.

[16]  Tomas J. Bird,et al.  Fine resolution mapping of population age-structures for health and development applications , 2015, Journal of The Royal Society Interface.

[17]  Robert D. Tortora,et al.  Sampling: Design and Analysis , 2000 .

[18]  Andrew J. Tatem,et al.  Creating spatial interpolation surfaces with DHS data , 2015 .

[19]  G. Seber,et al.  Adaptive Cluster Sampling , 2012 .

[20]  William C Miller,et al.  Sampling at community level by using satellite imagery and geographical analysis , 2014, Bulletin of the World Health Organization.

[21]  F. Checchi,et al.  Wanted: studies on mortality estimation methods for humanitarian emergencies, suggestions for future research , 2007, Emerging themes in epidemiology.

[22]  H. Shannon,et al.  Choosing a survey sample when data on the population are limited: a method using Global Positioning Systems and aerial and satellite photographs , 2012, Emerging Themes in Epidemiology.

[23]  A. Tatem,et al.  High Resolution Population Distribution Maps for Southeast Asia in 2010 and 2015 , 2013, PloS one.

[24]  Julea Andreea Maria,et al.  Operating procedure for the production of the Global Human Settlement Layer from Landsat data of the epochs 1975, 1990, 2000, and 2014 , 2016 .

[25]  J. Olsen,et al.  The European Commission , 2020, The European Union.

[26]  Azizur R. Molla,et al.  Using ArcMap, Google Earth, and Global Positioning Systems to select and locate random households in rural Haiti , 2013, International Journal of Health Geographics.

[27]  A. Tatem,et al.  The accuracy of human population maps for public health application , 2005, Tropical medicine & international health : TM & IH.

[28]  F. Cutts,et al.  Monitoring vaccination coverage: Defining the role of surveys , 2016, Vaccine.

[29]  C. Murray,et al.  Mortality in Iraq Associated with the 2003–2011 War and Occupation: Findings from a National Cluster Sample Survey by the University Collaborative Iraq Mortality Study , 2013, PLoS medicine.

[30]  N. McIntyre,et al.  Urban ecology as an interdisciplinary field: differences in the use of “urban” between the social and natural sciences , 2004, Urban Ecosystems.

[31]  Lenka Pitonakova,et al.  Rapid and Near Real-Time Assessments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake , 2016, PLoS currents.

[32]  Stan Openshaw,et al.  Modifiable Areal Unit Problem , 2008, Encyclopedia of GIS.

[33]  M. Friedl,et al.  Mapping global urban areas using MODIS 500-m data: new methods and datasets based on 'urban ecoregions'. , 2010 .

[34]  Yihan Lin,et al.  Using satellite imagery and GPS technology to create random sampling frames in high risk environments. , 2016, International journal of surgery.

[35]  Tigran Nikoghosyan,et al.  United Nations Children’s Fund (UNICEF) , 2018, Yearbook of International Cooperation on Environment and Development 1998–99.

[36]  M. Castro,et al.  Modelling strategic interventions in a population with a total fertility rate of 8.3: a cross-sectional study of Idjwi Island, DRC , 2012, BMC Public Health.

[37]  Andrew J. Tatem,et al.  Mapping the denominator: spatial demography in the measurement of progress. , 2014, International health.

[38]  D. Sridhar,et al.  Metric partnerships: global burden of disease estimates within the World Bank, the World Health Organisation and the Institute for Health Metrics and Evaluation. , 2019, Wellcome open research.

[39]  F. Pozzi,et al.  Mapping global urban and rural population distributions , 2006 .

[40]  A. Tatem,et al.  Defining approaches to settlement mapping for public health management in Kenya using medium spatial resolution satellite imagery. , 2004, Remote sensing of environment.

[41]  A. Tatem,et al.  Assessing the accuracy of satellite derived global and national urban maps in Kenya. , 2005, Remote sensing of environment.

[42]  Blake Zachary,et al.  Geographic displacement procedure and georeferenced data release policy for the Demographic and Health Surveys. , 2013 .

[43]  J. E. Dobson,et al.  LandScan: A Global Population Database for Estimating Populations at Risk , 2000 .

[44]  Kytt MacManus,et al.  Taking Advantage of the Improved Availability of Census Data: A First Look at the Gridded Population of the World, Version 4 , 2015 .

[45]  A. Tatem,et al.  Equality in Maternal and Newborn Health: Modelling Geographic Disparities in Utilisation of Care in Five East African Countries , 2016, PloS one.

[46]  L. Mullany,et al.  Health and Human Rights in Chin State, Western Burma: A Population-Based Assessment Using Multistaged Household Cluster Sampling , 2011, PLoS medicine.

[47]  Jessie Pinchoff,et al.  Open-source satellite enumeration to map households: planning and targeting indoor residual spraying for malaria , 2015, Malaria Journal.

[48]  Margaret Ellen Grosh,et al.  A manual for planning and implementing the Living standards measurement study survey , 1996 .

[49]  R. A. Sáenz,et al.  Problems , 2017 .

[50]  E. Luman,et al.  Comparison of two survey methodologies to assess vaccination coverage. , 2007, International journal of epidemiology.

[51]  H. Elsey,et al.  Addressing Inequities in Urban Health: Do Decision-Makers Have the Data They Need? Report from the Urban Health Data Special Session at International Conference on Urban Health Dhaka 2015 , 2016, Journal of Urban Health.