Validation of Spatially Allocated Small Area Estimates for 1880 Census Demography

Objective: This paper details the validation of a methodology which spatially allocates Census microdata to census tracts, based on known, aggregate tract population distributions. To protect confidentiality, public-use microdata contain no spatial identifiers other than the code indicating the Public Use Microdata Area (PUMA) in which the individual or household is located. Confirmatory information including the location of microdata households can only be obtained in a Census Research Data Center (CRDC). Due to restrictions in place at CRDCs, a systematic procedure for validating the spatial allocation methodology needs to be implemented prior to accessing CRDC data. Methods: This study demonstrates and evaluates such an approach, using historical census data for which a 100% count of the full population is available at a fine spatial resolution. The approach described allows for testing of the behavior of a maximum entropy imputation and spatial allocation model under different specifications. The imputation and allocation is performed using a microdata sample of records drawn from the full 1880 Census enumeration and synthetic summary files created from the same source. The results of the allocation are then validated against the actual values from the 100% count of 1880. Results: The results indicate that the validation procedure provides useful statistics, allowing an in-depth evaluation of the household allocation and identifying optimal configurations for model parameterization. This provides important insights as to how to design a validation procedure at a CRDC for spatial allocations using contemporary census data.

[1]  Robert G. Cromley,et al.  Singly‐ and Doubly‐Constrained Methods of Areal Interpolation for Vector‐based GIS , 1999, Trans. GIS.

[2]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[3]  Renato M. Assunção,et al.  Empirical bayes estimation of demographic schedules for small areas , 2005, Demography.

[4]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[5]  Barbara P. Buttenfield,et al.  Modeling Ambiguity in Census Microdata Allocations to Improve Demographic Small Area Estimates , 2013, Trans. GIS.

[6]  P H Rees,et al.  The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records , 1998, Environment & planning A.

[7]  Robert Tanton,et al.  Small area estimation using a reweighting algorithm , 2011 .

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[10]  Barbara P. Buttenfield,et al.  Dasymetric Modeling and Uncertainty , 2014, Annals of the Association of American Geographers. Association of American Geographers.

[11]  Steven Ruggles,et al.  The 1880 U.S. Population Database , 2003 .

[12]  David Voas,et al.  Evaluating Goodness-of-Fit Measures for Synthetic Microdata , 2001 .

[13]  Barbara P. Buttenfield,et al.  Maximum Entropy Dasymetric Modeling for Demographic Small Area Estimation , 2013 .

[14]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[15]  Ron Johnston,et al.  Entropy-Maximizing and the Iterative Proportional Fitting Procedure , 1993 .

[16]  Marcus Blake,et al.  An evaluation of synthetic household populations for census collection districts created using optimisation techniques , 2002 .

[17]  N. Denton,et al.  The Dimensions of Residential Segregation , 1988 .

[18]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[19]  A. Stewart Fotheringham,et al.  Principal Component Analysis on Spatial Data: An Overview , 2013 .

[20]  J. Logan,et al.  Mapping America in 1880: The Urban Transition Historical GIS Project , 2011, Historical methods.

[21]  Graham Clarke,et al.  Improving the Synthetic Data Generation Process in Spatial Microsimulation Models , 2009 .

[22]  D. J. Bogue,et al.  State Economic Areas , 1952 .

[23]  Clare K. Purvis,et al.  Using the American Community Survey: Benefits and Challenges , 2006 .

[24]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[25]  Graham Clarke,et al.  SimBritain: a spatial microsimulation approach to population dynamics , 2005 .

[26]  Kerstin Hermes,et al.  A review of current methods to generate synthetic spatial microdata using reweighting and future directions , 2012, Comput. Environ. Urban Syst..