Accuracy and repeatability of commercial geocoding.

The authors estimated accuracy and repeatability of commercial geocoding to guide vendor selection in the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study (2001-2002). They submitted 1,032 participant addresses (97% in Maryland, Minnesota, Mississippi, or North Carolina) to vendor A twice over 9 months and measured repeatability as agreement between levels of address matching, discordance (%) between statistical tabulation areas, and median distance (d, in meters) and bearing (theta;, in degrees) between coordinates assigned on each occasion (H(o):Sigma(i)( = 1 -->) (n) [theta;(i) /n] = 180 degrees ). They also submitted 75 addresses of nearby air pollution monitors (77% urban/suburban; 69% residential/commercial) to vendors A and B and then measured accuracy by comparing vendor- and US Environmental Protection Agency (EPA)-assigned geocodes using the above measures. Repeatability of geocodes assigned by vendor A was high (kappa = 0.90; census block group discordance = 5%; d < 1 m; theta; = 177 degrees ). The match rate for EPA monitor addresses was higher for vendor B versus A (88% vs. 76%), but discordance at census block group, tract, and county levels also was, respectively, 1.4-, 1.9-, and 5.0-fold higher for vendor B. Moreover, coordinates assigned by vendor B were further from those assigned by the EPA (d = 212 m vs. 149 m; theta; = 131 degrees vs. 171 degrees ). These findings suggest that match rates, repeatability, and accuracy should be used to guide vendor selection.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Jarvis T. Chen,et al.  Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?: the Public Health Disparities Geocoding Project. , 2002, American journal of epidemiology.

[3]  F. LeClere,et al.  Neighborhood social context and racial differences in women's heart disease mortality. , 1998, Journal of health and social behavior.

[4]  S. Dufour I: Commerce , 2001, Canadian Yearbook of international Law/Annuaire canadien de droit international.

[5]  P. Reynolds,et al.  Post Office Box Addresses: A Challenge for Geographic Information System-Based Studies , 2003, Epidemiology.

[6]  T. Allison,et al.  A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings , 1971 .

[7]  Eliot A. Cohen,et al.  National Imagery and Mapping Agency , 2001 .

[8]  Nancy Krieger,et al.  Place, space, and health: GIS and epidemiology. , 2003, Epidemiology.

[9]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[10]  R. Burnett,et al.  Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. , 2002, JAMA.

[11]  L. Chambless,et al.  Neighborhood of residence and incidence of coronary heart disease. , 2001, The New England journal of medicine.

[12]  R. Sinnott Virtues of the Haversine , 1984 .

[13]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[14]  S V Subramanian,et al.  Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas--the Public Health Disparities Geocoding Project. , 2002, American journal of public health.

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[17]  Thomas O Talbot,et al.  Positional error in automated geocoding of residential addresses , 2003, International journal of health geographics.

[18]  Jing Nie,et al.  Positional Accuracy of Geocoded Addresses in Epidemiologic Research , 2003, Epidemiology.