Positional error in automated geocoding of residential addresses

BackgroundPublic health applications using geographic information system (GIS) technology are steadily increasing. Many of these rely on the ability to locate where people live with respect to areas of exposure from environmental contaminants. Automated geocoding is a method used to assign geographic coordinates to an individual based on their street address. This method often relies on street centerline files as a geographic reference. Such a process introduces positional error in the geocoded point. Our study evaluated the positional error caused during automated geocoding of residential addresses and how this error varies between population densities. We also evaluated an alternative method of geocoding using residential property parcel data.ResultsPositional error was determined for 3,000 residential addresses using the distance between each geocoded point and its true location as determined with aerial imagery. Error was found to increase as population density decreased. In rural areas of an upstate New York study area, 95 percent of the addresses geocoded to within 2,872 m of their true location. Suburban areas revealed less error where 95 percent of the addresses geocoded to within 421 m. Urban areas demonstrated the least error where 95 percent of the addresses geocoded to within 152 m of their true location. As an alternative to using street centerline files for geocoding, we used residential property parcel points to locate the addresses. In the rural areas, 95 percent of the parcel points were within 195 m of the true location. In suburban areas, this distance was 39 m while in urban areas 95 percent of the parcel points were within 21 m of the true location.ConclusionResearchers need to determine if the level of error caused by a chosen method of geocoding may affect the results of their project. As an alternative method, property data can be used for geocoding addresses if the error caused by traditional methods is found to be unacceptable.

[1]  Jarvis T. Chen,et al.  Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?: the Public Health Disparities Geocoding Project. , 2002, American journal of epidemiology.

[2]  Michael Jerrett,et al.  Conceptual and practical issues in the detection of local disease clusters: a study of mortality in Hamilton, Ontario , 2002 .

[3]  David J. Maguire,et al.  Geographical Information Systems , 1993 .

[4]  S. Dearwent,et al.  Locational uncertainty in georeferencing public health datasets , 2001, Journal of Exposure Analysis and Environmental Epidemiology.

[5]  R Neutra,et al.  Examining associations between childhood asthma and traffic flow using a geographic information system. , 1999, Environmental health perspectives.

[6]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[7]  Ben-Ami Lipetz,et al.  Development and evaluation of a framework for assessing the efficiency and accuracy of street address geocoding strategies , 1996 .

[8]  Chih-Chung Kao,et al.  New York State 2000 Digitally Enhanced OrthoImagery , 1999 .

[9]  N Krieger,et al.  Changing to the 2000 standard million: are declining racial/ethnic and socioeconomic inequalities in health real progress or statistical illusion? , 2001, American journal of public health.

[10]  H L Howe Geocoding NY State Cancer Registry. , 1986, American journal of public health.

[11]  J A Stolwijk,et al.  Risk of congenital malformations associated with proximity to hazardous waste sites. , 1992, American journal of epidemiology.

[12]  Jerry H. Ratcliffe,et al.  On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units , 2001, Int. J. Geogr. Inf. Sci..

[13]  Peggy Reynolds,et al.  Childhood cancer incidence rates and hazardous air pollutants in California: an exploratory analysis. , 2002, Environmental health perspectives.

[14]  M Kulldorff,et al.  Evaluation of spatial filters to create smoothed maps of health data. , 2000, Statistics in medicine.

[15]  Jing Nie,et al.  Positional Accuracy of Geocoded Addresses in Epidemiologic Research , 2003, Epidemiology.

[16]  M. Charlton,et al.  Quantitative geography : perspectives on spatial data analysis by , 2001 .

[17]  S V Subramanian,et al.  Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas--the Public Health Disparities Geocoding Project. , 2002, American journal of public health.

[18]  N. Levine,et al.  The location of motor vehicle crashes in Honolulu: a methodology for geocoding intersections , 1998 .

[19]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[20]  Richard D. Mrozinski,et al.  Subject loss in spatial analysis of breast cancer. , 1999, Health & place.

[21]  H Checkoway,et al.  Bias due to misclassification in the estimation of relative risk. , 1977, American journal of epidemiology.

[22]  G. Rushton,et al.  Exploratory spatial analysis of birth defect rates in an urban population. , 1996, Statistics in medicine.

[23]  M. E. Kitto,et al.  Development and distribution of radon risk maps in New York State , 2001 .

[24]  J. L. Wiggins,et al.  Using geographic information systems technology in the collection, analysis, and presentation of cancer registry data: a handbook of basic practices , 2002 .

[25]  T O Talbot,et al.  Breast cancer risk and residence near industry or traffic in Nassau and Suffolk Counties, Long Island, New York. , 1996, Archives of environmental health.