Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization

The American Community Survey (ACS) is the largest survey of US households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts, the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance. This article presents a heuristic spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to process survey data in order to reduce the margins of error to a user-specified threshold.

[1]  S. Spielman,et al.  Patterns and causes of uncertainty in the American Community Survey. , 2014, Applied geography.

[2]  Giuseppe Arbia,et al.  Spatial Data Configuration in Statistical Analysis of Regional Economic and Related Problems , 1989 .

[3]  Renato M. Assunção,et al.  Empirical bayes estimation of demographic schedules for small areas , 2005, Demography.

[4]  Sergio J. Rey,et al.  PySAL: A Python Library of Spatial Analytical Methods , 2010 .

[5]  R. Dietz The estimation of neighborhood effects in the social sciences: An , 2002 .

[6]  J. M. Oakes,et al.  The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. , 2004, Social science & medicine.

[7]  T. Tarpey Linear Transformations and the k-Means Clustering Algorithm , 2007, American Statistician.

[8]  C. Revelle,et al.  Heuristic concentration: Two stage solution construction , 1997 .

[9]  A S Fotheringham,et al.  The Modifiable Areal Unit Problem in Multivariate Statistical Analysis , 1991 .

[10]  S. Openshaw A million or so correlation coefficients : three experiments on the modifiable areal unit problem , 1979 .

[11]  Jennifer D. Williams,et al.  The American Community Survey: Development, Implementation, and Issues for Congress , 2013 .

[12]  David Martin,et al.  The Application of Zone-Design Methodology in the 2001 UK Census , 2001 .

[13]  Douglas Steinley,et al.  Standardizing Variables in K -means Clustering , 2004 .

[14]  Daniel Arribas-Bel,et al.  Spatial Variation in the Quality of American Community Survey Estimates , 2014, Demography.

[15]  S. Spielman,et al.  Using High-Resolution Population Data to Identify Neighborhoods and Establish Their Boundaries , 2013, Annals of the Association of American Geographers. Association of American Geographers.

[16]  Luc Anselin,et al.  The Max‐P‐Regions Problem , 2012 .

[17]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[18]  Ahmed N. Albatineh,et al.  Means and variances for a family of similarity indices used in cluster analysis , 2010 .

[19]  Seth E. Spielman,et al.  Identifying regions based on flexible user-defined constraints , 2014, Int. J. Geogr. Inf. Sci..

[20]  Joel Smith,et al.  THE COMPATIBILITY OF ALTERNATIVE APPROACHES TO THE DELIMITATION OF URBAN SUB-AREAS* , 1954 .

[21]  David O'Sullivan,et al.  Beyond the Census Tract: Patterns and Determinants of Racial Segregation at Multiple Geographic Scales , 2008, American sociological review.

[22]  M. Kwan The Uncertain Geographic Context Problem , 2012 .

[23]  Fred Glover,et al.  Tabu Search: A Tutorial , 1990 .

[24]  Clare K. Purvis,et al.  Using the American Community Survey: Benefits and Challenges , 2006 .