Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata

Abstract Despite the increasing availability of current national censuses, these datasets are limited by their lack of small area demographic depth. At the same time, spatial microdata that include detailed demographic information are only available for limited geographies, thus limiting the complex analysis of population subgroups within and between small areas. Techniques such as Iterative Proportional Fitting have been previously suggested as a means to generate new data with the demographic granularity of individual surveys and the spatial granularity of small area tabulations of censuses and surveys. This article explores internal and external validation approaches for synthetic, small area, household- and individual-level microdata using a case study for Bangladesh. Using data from the Bangladesh Census 2011 and the Demographic and Health Survey, we produce estimates of infant mortality rate and other household attributes for small areas using a variation of an iterative proportional fitting method called P-MEDM. We conduct an internal validation to determine: whether the model accurately recreates the spatial variation of the input data, how each of the variables performed overall, and how the estimates compare to the published population totals. We conduct an external validation by comparing the estimates with indicators from the 2009 Multiple Indicator Cluster Survey (MICS) for Bangladesh to benchmark how well the estimates compared to a known dataset which was not used in the original model. The results indicate that the estimation process is viable for regions that are better represented in the microdata sample, but also revealed the possibility of strong overfitting in sparsely sampled sub-populations.

[1]  David W. S. Wong,et al.  The Reliability of Using the Iterative Proportional Fitting Procedure , 1992 .

[2]  M. Birkin Challenges for Spatial Dynamic Microsimulation Modelling , 2012 .

[3]  Ron Johnston,et al.  Entropy-Maximizing and the Iterative Proportional Fitting Procedure , 1993 .

[4]  Stan Openshaw,et al.  Modifiable Areal Unit Problem , 2008, Encyclopedia of GIS.

[5]  P H Rees,et al.  The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records , 1998, Environment & planning A.

[6]  Barbara P. Buttenfield,et al.  Validation of Spatially Allocated Small Area Estimates for 1880 Census Demography , 2013 .

[7]  Martin Clarke,et al.  Synthesis—A Synthetic Spatial Information System for Urban and Regional Analysis: Methods and Examples , 1988 .

[8]  A S Fotheringham,et al.  The Modifiable Areal Unit Problem in Multivariate Statistical Analysis , 1991 .

[9]  Robert Tanton,et al.  Validation of Spatial Microsimulation Models , 2012 .

[10]  Alan Wilson,et al.  A Family of Spatial Interaction Models, and Associated Developments , 1971 .

[11]  Barbara P. Buttenfield,et al.  Modeling Ambiguity in Census Microdata Allocations to Improve Demographic Small Area Estimates , 2013, Trans. GIS.

[12]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[13]  Graham Clarke,et al.  Exploring Microsimulation methodologies for the estimation of household attributes , 1999 .

[14]  W. Tobler Smooth pycnophylactic interpolation for geographical regions. , 1979, Journal of the American Statistical Association.

[15]  Ann Harding,et al.  Assessing Poverty and Inequality at a Detailed Regional Level: New Advances in Spatial Microsimulation , 2004 .

[16]  Barbara P. Buttenfield,et al.  Dasymetric Modeling and Uncertainty , 2014, Annals of the Association of American Geographers. Association of American Geographers.

[17]  Ron Johnston,et al.  Voter Transition Probability Estimates: An Entropy‐Maximizing Approach , 1983 .

[18]  Karyn Morrissey,et al.  Validation Issues and the Spatial Pattern of Household Income , 2013 .

[19]  B. Kozusznik,et al.  [United Nations International Children's Emergency Fund]. , 1979, Pediatria polska.

[20]  Ben Anderson,et al.  Estimating Small-Area Income Deprivation: An Iterative Proportional Fitting Approach , 2012 .

[21]  Ann Harding,et al.  Regional Dimensions: Creating Synthetic Small-area Microdata and Spatial Microsimulation Models , 2006 .

[22]  Graham Clarke,et al.  Improving the Synthetic Data Generation Process in Spatial Microsimulation Models , 2009 .

[23]  David Voas,et al.  Evaluating Goodness-of-Fit Measures for Synthetic Microdata , 2001 .

[24]  Barbara P. Buttenfield,et al.  Maximum Entropy Dasymetric Modeling for Demographic Small Area Estimation , 2013 .

[25]  Graham Clarke,et al.  Modelling the Local Impacts of National Social Policies: A Spatial Microsimulation Approach , 2001 .

[26]  Paul Norman,et al.  Putting Iterative Proportional Fitting on the researcher’s desk , 1999 .

[27]  The geography of ticket-Splitting: A preliminary study of the 1976 elections using entropy-maximizing methods , 1984 .

[28]  Elizabeth Taylor,et al.  Housing Unaffordability at the Statistical Local Area Level: New Estimates Using Spatial Microsimulation , 2004 .

[29]  A. Hay,et al.  On Testing for Structural Effects in Electoral Geography, Using Entropy-Maximising Methods to Estimate Voting Patterns , 1984 .

[30]  Mark Tranmer,et al.  Combining Sample and Census Data in Small Area Estimates: Iterative Proportional Fitting with Standard Software , 2005 .

[31]  Graham Clarke,et al.  SimBritain: a spatial microsimulation approach to population dynamics , 2005 .

[32]  Graham Clarke,et al.  Spatial microsimulation for rural policy analysis in Ireland: The implications of CAP reforms for the national spatial strategy , 2006 .

[33]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .