A multi-stage approach to maximizing geocoding success in a large population-based cohort study through automated and interactive processes.

To enable spatial analyses within a large, prospective cohort study of nearly 86,000 adults enrolled in a 12-state area in the southeastern United States of America from 2002-2009, a multi-stage geocoding protocol was developed to efficiently maximize the proportion of participants assigned an address level geographic coordinate. Addresses were parsed, cleaned and standardized before applying a combination of automated and interactive geocoding tools. Our full protocol increased the non-Post Office (PO) Box match rate from 74.5% to 97.6%. Overall, we geocoded 99.96% of participant addresses, with only 5.2% at the ZIP code centroid level (2.8% PO Box and 2.3% non-PO Box addresses). One key to reducing the need for interactive geocoding was the use of multiple base maps. Still, addresses in areas with population density <44 persons/km2 were much more likely to require resource-intensive interactive geocoding than those in areas with >920 persons/km2 (odds ratio (OR) = 5.24; 95% confidence interval (CI) = 4.23, 6.49), as were addresses collected from participants during in-person interviews compared with mailed questionnaires (OR = 1.83; 95% CI = 1.59, 2.11). This study demonstrates that population density and address ascertainment method can influence automated geocoding results and that high success in address level geocoding is achievable for large-scale studies covering wide geographical areas.

[1]  Marcia C Castro,et al.  Evaluation of the positional difference between two common geocoding methods. , 2011, Geospatial health.

[2]  W. Zheng,et al.  Southern community cohort study: establishing a cohort to investigate health disparities. , 2005, Journal of the National Medical Association.

[3]  Craig A. Knoblock,et al.  An effective and efficient approach for manually improving geocoded data. , 2008, International journal of health geographics.

[4]  Nataliya Kravets,et al.  The accuracy of address coding and the effects of coding errors. , 2007, Health & place.

[5]  William J Blot,et al.  CANCER INCIDENCE IN MUNICIPALITIES NEAR TWO FORMER NUCLEAR MATERIALS PROCESSING FACILITIES IN PENNSYLVANIA , 2003, Health physics.

[6]  Gerard Rushton,et al.  Geocoding in cancer research: a review. , 2006, American journal of preventive medicine.

[7]  G. Lin,et al.  Improving geocoding outcomes for the Nebraska Cancer Registry: learning from proven practices. , 2010, Journal of registry management.

[8]  Dale L Zimmerman,et al.  The effects of local street network characteristics on the positional accuracy of automated geocoding for geographic health studies , 2010, International journal of health geographics.

[9]  P. Zandbergen,et al.  Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads , 2007, BMC public health.

[10]  Chetan Tiwari,et al.  Geocoding Methods, Materials, and First Steps Toward A Geocoding Error Budget , 2007 .

[11]  Gerard Rushton,et al.  Modeling the probability distribution of positional errors incurred by residential address geocoding , 2007 .

[12]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[13]  Peter H Langlois,et al.  Match rate and positional accuracy of two geocoding methods for epidemiologic research. , 2006, Annals of epidemiology.

[14]  L. Pickle,et al.  Geographic bias related to geocoding in epidemiologic studies , 2005, International journal of health geographics.

[15]  Tony Fletcher,et al.  Geocoding rural addresses in a community contaminated by PFOA: a comparison of methods , 2010, Environmental health : a global access science source.

[16]  Joanne S Colt,et al.  Positional Accuracy of Two Methods of Geocoding , 2005, Epidemiology.

[17]  Carol Hanchette,et al.  Geographic information systems: their use in environmental epidemiologic research. , 1997, Environmental Health Perspectives.

[18]  L. Signorello,et al.  The Southern Community Cohort Study: Investigating Health Disparities , 2010, Journal of health care for the poor and underserved.

[19]  Jennifer C. Robinson,et al.  Methods for Retrospective Geocoding in Population Studies: The Jackson Heart Study , 2009, Journal of Urban Health.

[20]  Duanping Liao,et al.  Accuracy and repeatability of commercial geocoding. , 2004, American journal of epidemiology.

[21]  Richard D. Mrozinski,et al.  Subject loss in spatial analysis of breast cancer. , 1999, Health & place.

[22]  P. Reynolds,et al.  Post Office Box Addresses: A Challenge for Geographic Information System-Based Studies , 2003, Epidemiology.

[23]  G. Rushton,et al.  Geocoding Health Data : The Use of Geographic Codes in Cancer Prevention and Control, Research and Practice , 2007 .

[24]  Francis P. Boscoe The Science and Art of Geocoding: Tips for Improving Match Rates and Handling Unmatched Cases in Analysis , 2007 .

[25]  Michael C Wimberly,et al.  Geographic variability in geocoding success for West Nile virus cases in South Dakota. , 2009, Health & place.

[26]  William L. Bigbee,et al.  Cancer Incidence in Municipalities near Two Former Nuclear Materials Processing Facilities in Pennsylvania—An Update , 2009, Health physics.

[27]  Eric A Whitsel,et al.  International Journal of Health Geographics Historical Measures of Social Context in Life Course Studies: Retrospective Linkage of Addresses to Decennial Censuses , 2022 .

[28]  Soumya Mazumdar,et al.  Spatial clustering of the failure to geocode and its implications for the detection of disease clustering. , 2008, Statistics in medicine.

[29]  Gerard Rushton,et al.  Geocoding accuracy and the recovery of relationships between environmental exposures and health , 2008, International journal of health geographics.

[30]  Jeremy C. Weiss,et al.  Comparing a single-stage geocoding method to a multi-stage geocoding method: how much and where do they disagree? , 2007, International Journal of Health Geographics.

[31]  Gary Higgs,et al.  Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research. , 2007, Annals of epidemiology.

[32]  Jing Nie,et al.  Positional Accuracy of Geocoded Addresses in Epidemiologic Research , 2003, Epidemiology.

[33]  Thomas O Talbot,et al.  Positional error in automated geocoding of residential addresses , 2003, International journal of health geographics.

[34]  Richard L. Smith,et al.  Accuracy of commercial geocoding: assessment and implications , 2006, Epidemiologic perspectives & innovations : EP+I.

[35]  S. Dearwent,et al.  Locational uncertainty in georeferencing public health datasets , 2001, Journal of Exposure Analysis and Environmental Epidemiology.

[36]  Amy H Herring,et al.  Comparison of residential geocoding methods in population-based study of air quality and birth defects. , 2006, Environmental research.

[37]  Paul A. Zandbergen,et al.  A comparison of address point, parcel and street geocoding techniques , 2008, Comput. Environ. Urban Syst..

[38]  Amy Trentham-Dietz,et al.  Geocoding Addresses from a Large Population-based Study: Lessons Learned , 2003, Epidemiology.