Privacy protection versus cluster detection in spatial epidemiology.

OBJECTIVES Patient data that includes precise locations can reveal patients' identities, whereas data aggregated into administrative regions may preserve privacy and confidentiality. We investigated the effect of varying degrees of address precision (exact latitude and longitude vs the center points of zip code or census tracts) on detection of spatial clusters of cases. METHODS We simulated disease outbreaks by adding supplementary spatially clustered emergency department visits to authentic hospital emergency department syndromic surveillance data. We identified clusters with a spatial scan statistic and evaluated detection rate and accuracy. RESULTS More clusters were identified, and clusters were more accurately detected, when exact locations were used. That is, these clusters contained at least half of the simulated points and involved few additional emergency department visits. These results were especially apparent when the synthetic clustered points crossed administrative boundaries and fell into multiple zip code or census tracts. CONCLUSIONS The spatial cluster detection algorithm performed better when addresses were analyzed as exact locations than when they were analyzed as center points of zip code or census tracts, particularly when the clustered points crossed administrative boundaries. Use of precise addresses offers improved performance, but this practice must be weighed against privacy concerns in the establishment of public health data exchange policies.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  G. Rushton,et al.  Geographically masking health data to preserve confidentiality. , 1999, Statistics in medicine.

[3]  Bradley Malin,et al.  How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems , 2004, J. Biomed. Informatics.

[4]  Martin Kulldorff,et al.  Geographic differences in invasive and in situ breast cancer incidence according to precise geographic coordinates, Connecticut, 1991–95 , 2002, International journal of cancer.

[5]  J. Marc Overhage,et al.  Application of Information Technology: A Context-sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection , 2006, J. Am. Medical Informatics Assoc..

[6]  Kenneth D Mandl,et al.  Use of Emergency Department Chief Complaint and Diagnostic Codes for Identifying Respiratory Illness in a Pediatric Population , 2004, Pediatric emergency care.

[7]  Gerard Rushton,et al.  Public health, GIS, and spatial analytic tools. , 2003, Annual review of public health.

[8]  Colleen A Bradley,et al.  BioSense: implementation of a National Early Event Detection and Situational Awareness System. , 2005, MMWR supplements.

[9]  Irene Casas,et al.  Protection of Geoprivacy and Accuracy of Spatial Information: How Effective Are Geographical Masks? , 2004, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[10]  Robert B Mc Master,et al.  Considerations for Improving Geographic Information System Research in Public Health , 2000 .

[11]  Michael M. Wagner,et al.  Technical Description of RODS: A Real-time Public Health Surveillance System , 2003, Journal of the American Medical Informatics Association.

[12]  William B. Lober,et al.  Roundtable on bioterrorism detection: information system-based surveillance. , 2002, Journal of the American Medical Informatics Association : JAMIA.

[13]  K. Henning,et al.  What is syndromic surveillance? , 2004, MMWR supplements.

[14]  Joseph S Lombardo,et al.  ESSENCE II and the framework for evaluating syndromic surveillance systems. , 2004, MMWR supplements.

[15]  Kenneth D. Mandl,et al.  A software tool for creating simulated outbreaks to benchmark surveillance systems , 2005, BMC Medical Informatics Decis. Mak..

[16]  M. Kulldorff,et al.  Syndromic surveillance in public health practice, New York City. , 2004, Emerging infectious diseases.

[17]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[18]  Martin Kulldorff,et al.  Lumping or splitting: seeking the preferred areal unit for health geography studies , 2005, International journal of health geographics.

[19]  Geoffrey M Jacquez,et al.  Current practices in the spatial analysis of cancer: flies in the ointment , 2004, International journal of health geographics.

[20]  Kenneth D Mandl,et al.  Measuring outbreak-detection performance by using controlled feature set simulations. , 2004, MMWR supplements.

[21]  Julian Padget,et al.  Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses , 2006, J. Biomed. Informatics.

[22]  J. Gibson,et al.  Health information privacy and syndromic surveillance systems. , 2004, MMWR supplements.

[23]  Andrew F. Nelson,et al.  Syndromic surveillance using minimum transfer of identifiable data: The example of the national bioterrorism syndromic surveillance demonstration program , 2003, Journal of Urban Health.

[24]  Michael Leitner,et al.  Cartographic Guidelines for Geographically Masking the Locations of Confidential Point Data , 2004 .

[25]  Kenneth D. Mandl,et al.  Real time spatial cluster detection using interpoint distances among precise patient locations , 2005, BMC Medical Informatics Decis. Mak..

[26]  M. Kulldorff A spatial scan statistic , 1997 .

[27]  William B. Lober,et al.  Review Paper: Implementing Syndromic Surveillance: A Practical Guide Informed by the Early Experience , 2003, J. Am. Medical Informatics Assoc..