Location-based anonymization: comparison and evaluation of the Voronoi-based aggregation system

ABSTRACT Hospitals and health care organizations collect large amounts of detailed health care data that is in high demand by researchers. Thus, the possessors of such data are in need of methods that allow for this data to be released without compromising the confidentiality of the individuals to whom it pertains. As the geographic aspect of this data is becoming increasingly relevant for research being conducted, it is important for an anonymization process to pay due attention to the geographic attributes of such data. In this paper, a novel system for health care data anonymization is presented. At the core of the system is the aggregation of an initial regionalization guided by the use of a Voronoi diagram. We conduct a comparison with another location-based system of anonymization, GeoLeader. We show that our system is capable of producing results of a comparable quality with a much faster running time.

[1]  Pierre Chartrand,et al.  Canadian Institutes of Health Research , 2018, The Grants Register 2022.

[2]  Dara E. Seidl,et al.  Spatial obfuscation methods for privacy protection of household-level data , 2015 .

[3]  Jean-Pierre Corriveau,et al.  Geographic Partitioning Techniques for the Anonymization of Health Care Data , 2015, ArXiv.

[4]  Jean-Pierre Corriveau,et al.  A Novel Geographic Partitioning System for Anonymizing Health Care Data , 2015, ArXiv.

[5]  T. Nelson,et al.  An Overview of Spatial Analysis of Emerging Infectious Diseases , 2014 .

[6]  M. Leitner,et al.  Why Does Geoprivacy Matter? The Scientific Publication of Confidential Data Presented on Maps , 2014, Journal of empirical research on human research ethics : JERHRE.

[7]  C Nøhr,et al.  A Review and Framework for Categorizing Current Research and Development in Health Related Geographical Information Systems (GIS) Studies , 2014, Yearbook of Medical Informatics.

[8]  Ho-Won Jung,et al.  A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes , 2014, International Journal of Health Geographics.

[9]  K. Clifton,et al.  Application of Geographic Perturbation Methods to Residential Locations in the Oregon Household Activity Survey , 2013 .

[10]  Khaled El Emam,et al.  Evaluating the risk of patient re-identification from adverse drug event reports , 2013, BMC Medical Informatics and Decision Making.

[11]  Nitesh Kumar,et al.  Achieving k-anonymity Using Improved Greedy Heuristics for Very Large Relational Databases , 2013, Trans. Data Priv..

[12]  Thomas J. Lampoltshammer,et al.  Accuracy and privacy aspects in free online reverse geocoding services , 2013 .

[13]  Jörg-Rüdiger Sack,et al.  Techniques to protect privacy against inference attacks in location based services , 2012, IWGS '12.

[14]  Caitlin D Cottrill,et al.  Location Privacy: Who Protects? , 2011 .

[15]  Kokichi Sugihara,et al.  Why Are Voronoi Diagrams so Fruitful in Application? , 2011, 2011 Eighth International Symposium on Voronoi Diagrams in Science and Engineering.

[16]  David L. Buckeridge,et al.  The re-identification risk of Canadians from longitudinal demographics , 2011, BMC Medical Informatics Decis. Mak..

[17]  William B Allshouse,et al.  Practice of Epidemiology Mapping Health Data: Improved Privacy Protection With Donut Method Geomasking , 2010 .

[18]  William B Allshouse,et al.  Geomasking sensitive health data and privacy protection: an evaluation using an E911 database , 2010, Geocarto international.

[19]  Khaled El Emam,et al.  A method for managing re-identification risk from small geographic areas in Canada , 2010, BMC Medical Informatics Decis. Mak..

[20]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[21]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[22]  John Krumm,et al.  A survey of computational location privacy , 2009, Personal and Ubiquitous Computing.

[23]  K. Emam,et al.  Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records. , 2009, The Canadian journal of hospital pharmacy.

[24]  Benjamin C. M. Fung,et al.  Anonymizing healthcare data: a case study on the blood transfusion service , 2009, KDD.

[25]  Caroline Young,et al.  Geographically intelligent disclosure control for flexible aggregation of census data , 2009, Int. J. Geogr. Inf. Sci..

[26]  Khaled El Emam,et al.  Model Formulation: Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk , 2009, J. Am. Medical Informatics Assoc..

[27]  Tamir Tassa,et al.  k-Anonymization with Minimal Loss of Information , 2009, IEEE Transactions on Knowledge and Data Engineering.

[28]  Khaled El Emam,et al.  Protecting privacy using k-anonymity. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[29]  D. Cummings,et al.  The impact of a physical geographic barrier on the dynamics of measles , 2007, Epidemiology and Infection.

[30]  John Krumm,et al.  Inference Attacks on Location Tracks , 2007, Pervasive.

[31]  Graham Dunn,et al.  Geographical epidemiology, spatial analysis and geographical information systems: a multidisciplinary glossary , 2007, Journal of Epidemiology and Community Health.

[32]  Kenneth D Mandl,et al.  Privacy protection versus cluster detection in spatial epidemiology. , 2006, American journal of public health.

[33]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Maged N Kamel Boulos,et al.  International Journal of Health Geographics Open Access towards Evidence-based, Gis-driven National Spatial Health Information Infrastructure and Surveillance Services in the United Kingdom , 2022 .

[35]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[36]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[37]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[38]  G. Rushton,et al.  Geographically masking health data to preserve confidentiality. , 1999, Statistics in medicine.

[39]  C. Skinner,et al.  Safe data versus safe setting: access to microdata from the British Census , 1994 .

[40]  Steven Fortune,et al.  A sweepline algorithm for Voronoi diagrams , 1986, SCG '86.

[41]  J. Sack,et al.  RESEARCH ARTICLE Geographic Partitioning Techniques for the Anonymization of Health Care Data , 2015 .

[42]  D. Richardson,et al.  Integrating Geography and Social Epidemiology in Drug Abuse Research , 2008 .

[43]  Scott A. Bridwell The dimensions of locational privacy , 2007 .

[44]  Lowrance Wm Access to Collections of Data and Material for Health Research. A report to the Medical Research Council and the Wellcome Trust , 2006 .

[45]  B. Greenberg,et al.  RELATING RISK OF DISCLOSURE FOR MICRODATA AND GEOGRAPHIC AREA SIZE , 2002 .

[46]  S. Hawala Enhancing the " 100 , 000 rule " On The Variation Of The Per Cent Of Uniques In A Microdata Sample And The Geographic Area Size Identified , 2001 .