Protecting count queries in study design

OBJECTIVE Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. METHODS A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. RESULTS Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. CONCLUSIONS Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[3]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[4]  Eva Kline-Rogers,et al.  Potential Impact of the HIPAA Privacy Rule on Data Collection in a Registry of Patients With Acute Coronary Syndrome , 2005 .

[5]  Martijn Stam Beyond Uniformity: Better Security/Efficiency Tradeoffs for Compression Functions , 2008, CRYPTO.

[6]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[7]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[8]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[9]  Bradley Malin,et al.  k-Unlinkability: A privacy protection model for distributed data , 2008, Data Knowl. Eng..

[10]  Khaled El Emam,et al.  Model Formulation: Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk , 2009, J. Am. Medical Informatics Assoc..

[11]  Robert J Levine,et al.  The dysregulation of human subjects research. , 2007, JAMA.

[12]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[15]  E. B. Steen,et al.  The Computer-Based Patient Record: An Essential Technology for Health Care , 1992, Annals of Internal Medicine.

[16]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[17]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[18]  William E. Winkler,et al.  Masking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems , 2004, Privacy in Statistical Databases.

[19]  Isaac S. Kohane,et al.  Integration of Clinical and Genetic Data in the i2b2 Architecture , 2006, AMIA.

[20]  Laura A. Levit,et al.  Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press , 2009 .

[21]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[22]  Henry C. Chueh,et al.  A security architecture for query tools used to access large biomedical databases , 2002, AMIA.

[23]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[24]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[25]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Lawrence O Gostin,et al.  Reforming the HIPAA Privacy Rule: safeguarding privacy and promoting research. , 2009, JAMA.

[27]  Cynthia Dwork,et al.  New Efficient Attacks on Statistical Disclosure Control Mechanisms , 2008, CRYPTO.

[28]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[29]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.