Privacy-Preserving Verification of Clinical Research

We treat the problem of privacy-preserving statistics verification in clinical research. We show that given aggregated results from statistical calculations, we can verify their correctness efficiently, without revealing any of the private inputs used for the calculation. Our construction is based on the primitive of Secure Multi-Party Computation from Shamir's Secret Sharing. Basically, our setting involves three parties: a hospital, which owns the private inputs, a clinical researcher, who lawfully processes the sensitive data to produce an aggregated statistical result, and a third party (usually several verifiers) assigned to verify this result for reliability and transparency reasons. Our solution guarantees that these verifiers only learn about the aggregated results (and what can be inferred from those about the underlying private data) and nothing more. By taking advantage of the particular scenario at hand (where certain intermediate results, e.g., the mean over the dataset, are available in the clear) and utilizing secret sharing primitives, our approach turns out to be practically efficient, which we underpin by performing several experiments on real patient data. Our results show that the privacy-preserving verification of the most commonly used statistical operations in clinical research presents itself as an important use case, where the concept of secure multi-party computation becomes employable in practice.

[1]  Dan Boneh,et al.  Linearly Homomorphic Signatures over Binary Fields and New Tools for Lattice-Based Signatures , 2011, Public Key Cryptography.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  J. D. De Muth,et al.  Overview of biostatistics used in clinical research. , 2009, American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists.

[4]  Ivan Damgård,et al.  Secure Multiparty Computation Goes Live , 2009, Financial Cryptography.

[5]  Elaine Shi,et al.  Signatures of Correct Computation , 2013, TCC.

[6]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[7]  Damien McAullay,et al.  Remote access methods for exploratory data analysis and statistical modelling: Privacy-Preserving Analytics® , 2008, Comput. Methods Programs Biomed..

[8]  K. Sowinski,et al.  Biostatistics primer: part I. , 2007, Nutrition in clinical practice : official publication of the American Society for Parenteral and Enteral Nutrition.

[9]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[10]  Kathleen Zellner,et al.  Statistics used in current nursing research. , 2007, The Journal of nursing education.

[11]  George T. Duncan,et al.  Obtaining Information while Preserving Privacy: A Markov Perturbation Method for Tabular Data , 1997 .

[12]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[13]  J Ranstam,et al.  Fraud in medical research: an international survey of biostatisticians. ISCB Subcommittee on Fraud. , 2000, Controlled clinical trials.

[14]  Vinod Vaikuntanathan,et al.  How to Delegate and Verify in Public: Verifiable Computation from Attribute-based Encryption , 2012, IACR Cryptol. ePrint Arch..

[15]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[16]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[17]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[18]  Yevgeniy Vahlis,et al.  Verifiable Delegation of Computation over Large Datasets , 2011, IACR Cryptol. ePrint Arch..

[19]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[20]  Stuart Haber,et al.  Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases , 2009, Privacy Enhancing Technologies.

[21]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[22]  Dawn Xiaodong Song,et al.  Homomorphic Signature Schemes , 2002, CT-RSA.

[23]  Yuval Ishai,et al.  From Secrecy to Soundness: Efficient Verification via Secure Computation , 2010, ICALP.

[24]  Xenofontas A. Dimitropoulos,et al.  SEPIA: Privacy-Preserving Aggregation of Multi-Domain Network Events and Statistics , 2010, USENIX Security Symposium.

[25]  Abhi Shelat,et al.  Computing on Authenticated Data , 2012, Journal of Cryptology.

[26]  Dan Boneh,et al.  Homomorphic Signatures for Polynomial Functions , 2011, EUROCRYPT.

[27]  K. Sowinski,et al.  Biostatistics primer: part 2. , 2008, Nutrition in clinical practice : official publication of the American Society for Parenteral and Enteral Nutrition.

[28]  Val Jones,et al.  Predicting feedback compliance in a teletreatment application , 2010, 2010 3rd International Symposium on Applied Sciences in Biomedical and Communication Technologies (ISABEL 2010).

[29]  Craig Gentry,et al.  Non-interactive Verifiable Computing: Outsourcing Computation to Untrusted Workers , 2010, CRYPTO.

[30]  Tal Rabin,et al.  Simplified VSS and fast-track multiparty computations with applications to threshold cryptography , 1998, PODC '98.

[31]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[32]  D. Fanelli How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data , 2009, PloS one.

[33]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[34]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.