Conceptual and practical issues in the detection of local disease clusters: a study of mortality in Hamilton, Ontario

Recent advances in local spatial statistics and operational computing capacity have led to growing interest in the detection of disease clusters for public health surveillance and for improving understanding of disease pathogenesis. Although conceptual reviews and applied examples have appeared in the literature, few studies have addressed the connection between conceptual and practical issues that confront researchers interested in using local statistics to detect disease clusters. Here we review recent literature on the use of local statistics for cluster assessment and focus on the practical issue of assigning correct geographic coordinates. The process of assigning geographic coordinates to an address or postal code, known as `geocoding', is a necessary step in conducting small-area health analyses. With a study of mortality data from Hamilton, Ontario, we illustrate inaccuracies that may be encountered when using Statistics Canada postal code conversion files. Using the Moran's I and Getis-Ord Gi and Gi* local spatial statistics to identify significant mortality clusters or `hot spots', we demonstrate that small geocoding errors, even those that affect less than one percent of a total dataset, can have a discernible impact on analytic results. To assist other researchers, we supply guidelines to minimize error introduced by geocoding. These results emphasize the importance of accurate geocoding in local health analyses. Key words: local spatial statistics, geocoding, cluster analysis, GIS, Hamilton Les avancees recentes en statistiques spatiales localisees et en capacite informatique operationnelle ont conduit a un interet croissant dans la detection de foyers de maladies pour fins de surveillance de sante publique, et dans l'approfondissement de la comprehension de leur pathogenese. Bien que des revues conceptuelles et des exemples concrets aient ete publies dans la litterature, peu d'etudes ont adresse le lien entre les problemes conceptuels et pratiques auxquels sont confrontes les chercheurs interesses a utiliser les statistiques locales pour detecter les foyers de maladies. Nous revoyons ici la litterature recente sur l'utilisation de statistiques locales dans l'evaluation de foyers et focalisons sur le probleme pratique d'assigner des coordonnees geographiques correctes. Le procede d'assigner des coordonnees geographiques a une adresse ou a un code postal, nomme `geocodage', est une etape necessaire dans la conduite d'analyses de sante a petite echelle. A l'aide d'une etude sur des donnees de mortalite a Hamilton, en Ontario, nous illustrons que des inexactitudes peuvent etre rencontrees lorsque les fichiers de codes postaux et de conversion de Statistique Canada sont utilises. En utilisant les statistiques spatiales Iocalisees I de Moran, Gi and Gi* de Getis et Ord pour identifier des foyers de mortalite signilOcatifs ou des `points chauds', nous demontrons que de petites erreurs de geocodage, meme celles n'affectant moins qu'un pour cent de la base de donnees, peuvent avoir un impact discernable sur les resultats analytiques. Afin d'aider d'autres chercheurs, nous fournissons des recommandations pour minimiser les erreurs introduites par le geocodage. Ces resultats soulignent l'importance d'un geocodage exact dans les analyses de sante locale. Mots-cles: statistiques spatiales Iocalisees, geocodage, l'evaluation de foyers, systemes informatiques geographiques, Hamilton *********** Introduction The advent of geographic information systems (GIS) and associated spatial statistical software has expanded the use of spatial analysis in environmental and public health research (Chen et al. 1998; Getis 1998; Rushton 1998; Dearwent et al. 2001). GIS can be used as a spatial analysis system for the organization, storage, transformation, retrieval, analysis, and display of spatial or geographic data (Aronoff 1989; DeMers 1998). The use of GIS in the spatial analysis of disease has facilitated the study of small, localized areas to investigate not only inter-regional variations in disease prevalence (e. …

[1]  S. Dearwent,et al.  Locational uncertainty in georeferencing public health datasets , 2001, Journal of Exposure Analysis and Environmental Epidemiology.

[2]  Manfred M. Fischer,et al.  Recent Developments in Spatial Analysis , 1997 .

[3]  Graham J. Wills,et al.  Dynamic Graphics for Exploring Spatial Data with Application to Locating Global and Local Anomalies , 1991 .

[4]  W. Gesler The uses of spatial analysis in medical geography: a review. , 1986, Social science & medicine.

[5]  P. Haggett Geographical aspects of the emergence of infectious diseases , 1994 .

[6]  Antony Unwin,et al.  Exploratory spatial data analysis with local statistics , 1998 .

[7]  D Wartenberg,et al.  Solving the cluster puzzle: clues to follow and pitfalls to avoid. , 1993, Statistics in medicine.

[8]  P. Enterline Evaluating cancer clusters. , 1985, American Industrial Hygiene Association journal.

[9]  P. Diggle A point process modeling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point , 1990 .

[10]  G M Jacquez,et al.  The Analysis of Disease Clusters, Part I: State of the Art , 1996, Infection Control & Hospital Epidemiology.

[11]  S Greenland,et al.  Randomization, Statistics, and Causal Inference , 1990, Epidemiology.

[12]  Martin Charlton,et al.  The Geography of Parameter Space: An Investigation of Spatial Non-Stationarity , 1996, Int. J. Geogr. Inf. Sci..

[13]  M. Cetron,et al.  Geocoding and linking data from population-based surveillance and the US Census to evaluate the impact of median household income on the epidemiology of invasive Streptococcus pneumoniae infections. , 1998, American journal of epidemiology.

[14]  Peter A. Rogerson,et al.  GIS and Spatial Analytical Problems , 1993, Int. J. Geogr. Inf. Sci..

[15]  L. Waller,et al.  The Analysis of Disease Clusters, Part II: Introduction to Techniques , 1996, Infection Control & Hospital Epidemiology.

[16]  Luc Anselin,et al.  Exploratory Spatial Data Analysis Linking SpaceStat and ArcView , 1997 .

[17]  K. C. Clarke,et al.  On epidemiology and geographic information systems: a review and discussion of future directions. , 1996, Emerging infectious diseases.

[18]  Alan M. MacEachren,et al.  Visualizing Georeferenced Data: Representing Reliability of Health Statistics , 1998 .

[19]  J. Bithell The choice of test for detecting raised disease risk near a point source. , 1995, Statistics in medicine.

[20]  B. Turnbull,et al.  Monitoring for clusters of disease: application to leukemia incidence in upstate New York. , 1990, American journal of epidemiology.

[21]  R. Burnett,et al.  A GIS–Environmental Justice Analysis of Particulate Air Pollution in Hamilton, Canada , 2001 .

[22]  T. Carpenter,et al.  Spatial analytical methods and geographic information systems: use in health research and epidemiology. , 1999, Epidemiologic reviews.

[23]  P. Diggle,et al.  Spatial point pattern analysis and its application in geographical epidemiology , 1996 .

[24]  M. Kulldorff,et al.  Childhood leukaemia in Sweden: using GIS and a spatial scan statistic for cluster detection. , 1996, Statistics in medicine.

[25]  Stephen Wise,et al.  Exploratory spatial data analysis in a geographic information system environment , 1998 .

[26]  Roger Marshall,et al.  A Review of Methods for the Statistical Analysis of Spatial Patterns of Disease , 1991 .

[27]  K J Rothman,et al.  A sobering start for the cluster busters' conference. , 1990, American journal of epidemiology.

[28]  S D Walter The analysis of regional patterns in health data. I. Distributional considerations. , 1992, American journal of epidemiology.

[29]  W. Mcbride,et al.  Thalidomide and Congenital Abnormalities , 1961 .

[30]  D B Rubin,et al.  Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. , 1991, Biometrics.