Geographic bias related to geocoding in epidemiologic studies

BackgroundThis article describes geographic bias in GIS analyses with unrepresentative data owing to missing geocodes, using as an example a spatial analysis of prostate cancer incidence among whites and African Americans in Virginia, 1990–1999. Statistical tests for clustering were performed and such clusters mapped. The patterns of missing census tract identifiers for the cases were examined by generalized linear regression models.ResultsThe county of residency for all cases was known, and 26,338 (74%) of these cases were geocoded successfully to census tracts. Cluster maps showed patterns that appeared markedly different, depending upon whether one used all cases or those geocoded to the census tract. Multivariate regression analysis showed that, in the most rural counties (where the missing data were concentrated), the percent of a county's population over age 64 and with less than a high school education were both independently associated with a higher percent of missing geocodes.ConclusionWe found statistically significant pattern differences resulting from spatially non-random differences in geocoding completeness across Virginia. Appropriate interpretation of maps, therefore, requires an understanding of this phenomenon, which we call "cartographic confounding."

[1]  Martin Kulldorff,et al.  Lumping or splitting: seeking the preferred areal unit for health geography studies , 2005, International journal of health geographics.

[2]  M. Goodchild,et al.  Geographic Information Systems and Science (second edition) , 2001 .

[3]  Amy Trentham-Dietz,et al.  Geocoding Addresses from a Large Population-based Study: Lessons Learned , 2003, Epidemiology.

[4]  M. Kulldorff,et al.  The role of area-level influences on prostate cancer grade and stage at diagnosis. , 2004, Preventive medicine.

[5]  Sara L McLafferty,et al.  GIS and health care. , 2003, Annual review of public health.

[6]  R. Miles-doan,et al.  Geographic concentration of violence between intimate partners. , 1997, Public health reports.

[7]  Nataliya Kravets,et al.  The accuracy of address coding and the effects of coding errors. , 2007, Health & place.

[8]  P. Reynolds,et al.  Post Office Box Addresses: A Challenge for Geographic Information System-Based Studies , 2003, Epidemiology.

[9]  H. Toutenburg Fleiss, J. L.: Statistical Methods for Rates and Proportions. John Wiley & Sons, New York‐London‐Sydney‐Toronto 1973. XIII, 233 S. , 1974 .

[10]  S V Subramanian,et al.  Painting a truer picture of US socioeconomic and racial/ethnic health inequalities: the Public Health Disparities Geocoding Project. , 2005, American journal of public health.

[11]  Duanping Liao,et al.  Accuracy and repeatability of commercial geocoding. , 2004, American journal of epidemiology.

[12]  T. Ricketts Geographic information systems and public health. , 2003, Annual review of public health.

[13]  P. Diggle Applied Spatial Statistics for Public Health Data , 2005 .

[14]  K. Rothman,et al.  Modern Epidemiology Second Edition , 2003 .

[15]  L. Pickle,et al.  Application of a weighted head-banging algorithm to mortality data maps. , 1999, Statistics in medicine.

[16]  Richard D. Mrozinski,et al.  Subject loss in spatial analysis of breast cancer. , 1999, Health & place.

[17]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[18]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[19]  S V Subramanian,et al.  Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas--the Public Health Disparities Geocoding Project. , 2002, American journal of public health.

[20]  L. Pickle,et al.  Spatial analysis of prostate cancer incidence and race in Virginia, 1990-1999. , 2006, American journal of preventive medicine.

[21]  R. Gie,et al.  The use of a geographical information system (GIS) to evaluate the distribution of tuberculosis in a high-incidence community. , 1996, South African medical journal = Suid-Afrikaanse tydskrif vir geneeskunde.

[22]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[23]  Jarvis T. Chen,et al.  Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?: the Public Health Disparities Geocoding Project. , 2002, American journal of epidemiology.

[24]  W Vach,et al.  Some issues in estimating the effect of prognostic factors from incomplete covariate data. , 1997, Statistics in medicine.

[25]  L. Waller,et al.  Applied Spatial Statistics for Public Health Data: Waller/Applied Spatial Statistics , 2004 .

[26]  S. McLafferty,et al.  GIS and Public Health , 2002 .

[27]  Thomas O Talbot,et al.  Positional error in automated geocoding of residential addresses , 2003, International journal of health geographics.

[28]  M. Kulldorff A spatial scan statistic , 1997 .

[29]  N Krieger,et al.  Changing to the 2000 standard million: are declining racial/ethnic and socioeconomic inequalities in health real progress or statistical illusion? , 2001, American journal of public health.

[30]  Jing Nie,et al.  Positional Accuracy of Geocoded Addresses in Epidemiologic Research , 2003, Epidemiology.

[31]  T. Tango,et al.  A test for spatial disease clustering adjusted for multiple testing. , 2000, Statistics in medicine.

[32]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[33]  L. R. Christensen Simultaneous Statistical Inference in the Normal Multiple Linear Regression Model , 1973 .

[34]  Gerard Rushton,et al.  Public health, GIS, and spatial analytic tools. , 2003, Annual review of public health.

[35]  G. Rushton,et al.  Exploratory spatial analysis of birth defect rates in an urban population. , 1996, Statistics in medicine.

[36]  L. Pickle PREDICTION OF INCIDENT CANCER CASES IN NON-SEER COUNTIES , 2003 .