Differential Privacy for Clinical Trial Data: Preliminary Evaluations

The concept of differential privacy as a rigorous definition of privacy has emerged from the cryptographic community. However, further careful evaluation is needed before we can apply these theoretical results to privacy preservation in everyday data mining and statistical analysis. In this paper we demonstrate how to integrate a differential privacy framework with the classical statistical hypothesis testing in the domain of clinical trials where personal information is sensitive. We develop concrete methodology that researchers can use. We derive rules for the sample size adjustment whereby both statistical efficiency and differential privacy can be achieved for the specific tests for binomial random variables and in contingency tables.

[1]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[2]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[3]  Vladimir Brusic,et al.  Data mining of cancer vaccine trials: a bird's-eye view , 2008, Immunome research.

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[5]  G. Belle Statistical rules of thumb , 2002 .

[6]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[7]  Gerald van Belle,et al.  Statistical Rules of Thumb, Second Edition , 2008 .

[8]  Judith A. Clarke,et al.  Applications of The Normal Laplace and Generalized Normal Laplace Distributions , 2008 .

[9]  Lawrence O. Hall,et al.  Mining for Implications in Medical Data , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Tony Tse,et al.  Reporting "basic results" in ClinicalTrials.gov. , 2009, Chest.

[11]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[12]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[13]  T. Ferguson A Course in Large Sample Theory , 1996 .

[14]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[15]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[16]  S. Piantadosi Clinical Trials : A Methodologic Perspective , 2005 .

[17]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[18]  Uriel Feige,et al.  Proceedings of the thirty-ninth annual ACM symposium on Theory of computing , 2007, STOC 2007.

[19]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[20]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[21]  James K. Brewer,et al.  Statistical Rules of Thumb , 2003 .