On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological data

BackgroundWhile the use of spatially referenced data for the analysis of epidemiological data is growing, issues associated with selecting the appropriate geographic unit of analysis are also emerging. A particularly problematic unit is the ZIP code. Lacking standardization and highly dynamic in structure, the use of ZIP codes and ZIP code tabulation areas (ZCTA) for the spatial analysis of disease present a unique challenge to researchers. Problems associated with these units for detecting spatial patterns of disease are explored.ResultsA brief review of ZIP codes and their spatial representation is conducted. Though frequently represented as polygons to facilitate analysis, ZIP codes are actually defined at a narrower spatial resolution reflecting the street addresses they serve. This research shows that their generalization as continuous regions is an imposed structure that can have serious implications in the interpretation of research results. ZIP codes areas and Census defined ZCTAs, two commonly used polygonal representations of ZIP code address ranges, are examined in an effort to identify the spatial statistical sensitivities that emerge given differences in how these representations are defined. Here, comparative analysis focuses on the detection of patterns of prostate cancer in New York State. Of particular interest for studies utilizing local, spatial statistical tests, is that differences in the topological structures of ZIP code areas and ZCTAs give rise to different spatial patterns of disease. These differences are related to the different methodologies used in the generalization of ZIP code information. Given the difficulty associated with generating ZIP code boundaries, both ZIP code areas and ZCTAs contain numerous representational errors which can have a significant impact on spatial analysis. While the use of ZIP code polygons for spatial analysis is relatively straightforward, ZCTA representations contain additional topological features (e.g. lakes and rivers) and contain fragmented polygons that can hinder spatial analysis.ConclusionCaution must be exercised when using spatially referenced data, particularly that which is attributed to ZIP codes and ZCTAs, for epidemiological analysis. Researchers should be cognizant of representational errors associated with both geographies and their resulting spatial mismatch, especially when comparing the results obtained using different topological representations. While ZCTAs can be problematic, topological corrections are easily implemented in a geographic information system to remedy erroneous aggregation effects.

[1]  H. Miller,et al.  Representation and Spatial Analysis in Geographic Information Systems , 2003 .

[2]  Fahui Wang,et al.  Measures of Spatial Accessibility to Health Care in a GIS Environment: Synthesis and a Case Study in the Chicago Region , 2003, Environment and planning. B, Planning & design.

[3]  Francis P Boscoe,et al.  Effects of randomization methods on statistical inference in disease cluster detection. , 2007, Health & place.

[4]  Glen D. Johnson Small area mapping of prostate cancer incidence in New York State (USA) using fully Bayesian hierarchical modelling , 2004, International journal of health geographics.

[5]  Denise Dunbar,et al.  Using GIS technology to identify areas of tuberculosis transmission and incidence , 2004, International journal of health geographics.

[6]  D. Acevedo-Garcia,et al.  Zip code-level risk factors for tuberculosis: neighborhood environment and residential segregation in New Jersey, 1985-1992. , 2001, American journal of public health.

[7]  S V Subramanian,et al.  Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas--the Public Health Disparities Geocoding Project. , 2002, American journal of public health.

[8]  G. Jacquez,et al.  Local clustering in breast, lung and colorectal cancer in Long Island, New York , 2003, International journal of health geographics.

[9]  Katarzyna Grala,et al.  Avian GIS models signal human risk for West Nile virus in Mississippi , 2006, International journal of health geographics.

[10]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[11]  Peggy Reynolds,et al.  International Journal of Health Geographics Open Access Current Practices in Spatial Analysis of Cancer Data: Data Characteristics and Data Sources for Geographic Studies of Cancer , 2022 .

[12]  N Krieger,et al.  Changing to the 2000 standard million: are declining racial/ethnic and socioeconomic inequalities in health real progress or statistical illusion? , 2001, American journal of public health.

[13]  Peter A Rogerson,et al.  Geographical variation of cerebrovascular disease in New York State: the correlation with income , 2005, International journal of health geographics.

[14]  Stan Openshaw,et al.  Modifiable Areal Unit Problem , 2008, Encyclopedia of GIS.

[15]  Fahui Wang Spatial Clusters of Cancers in Illinois 1986–2000 , 2004, Journal of Medical Systems.

[16]  V. Hertzberg,et al.  Geographic clustering of Pneumocystis carinii pneumonia in patients with HIV infection. , 2000, American journal of respiratory and critical care medicine.

[17]  Youngihn Kho,et al.  GeoDa: An Introduction to Spatial Data Analysis , 2006 .

[18]  Jerry H. Ratcliffe,et al.  On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units , 2001, Int. J. Geogr. Inf. Sci..

[19]  Geoffrey M Jacquez,et al.  Current practices in the spatial analysis of cancer: flies in the ointment , 2004, International journal of health geographics.