Privacy-Preserving Elastic Net for Data Encrypted by Different Keys - With an Application on Biomarker Discovery

Elastic net is a popular linear regression tool and has many important applications, in particular, finding genomic biomarkers for cancers from gene expression profiles for personalized medicine (elastic net is currently the most accurate prediction method for this problem). There is an increasing trend for organizations to store their data (e.g. gene expression profiles) in an untrusted third-party cloud system in order to leverage both its storage capacity and computational power. Due to the privacy concern, data must be stored in its encrypted form. While there are quite a number of privacy-preserving data mining protocols on encrypted data, there does not exist one for elastic net. In this paper, we propose the first privacy-preserving elastic net protocol using two non-colluding servers. Our protocol is able to handle expression profiles encrypted from multiple medical units using different encryption keys. Thus, collaboration between multiple medical units are made possible without jeopardizing the privacy of data records. We formally prove that our protocol is secure and implemented the protocol. The experimental results show that our protocol runs reasonably fast, thus can be applied in practice.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  Emmanuel Bresson,et al.  A Simple Public-Key Cryptosystem with a Double Trapdoor Decryption Mechanism and Its Applications , 2003, ASIACRYPT.

[3]  Gregory W. Wornell,et al.  Efficient homomorphic encryption on integer vectors and its applications , 2014, 2014 Information Theory and Applications Workshop (ITA).

[4]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[5]  Justin Guinney,et al.  Systematic Assessment of Analytical Methods for Drug Sensitivity Prediction from Cancer Cell Line Data , 2013, Pacific Symposium on Biocomputing.

[6]  Ming-Syan Chen,et al.  Privacy-preserving outsourcing support vector machines with random transformation , 2010, KDD.

[7]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[8]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[9]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[10]  Stefan Katzenbeisser,et al.  Efficiently Outsourcing Multiparty Computation Under Multiple Keys , 2013, IEEE Transactions on Information Forensics and Security.

[11]  Vinod Vaikuntanathan,et al.  On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption , 2012, STOC '12.

[12]  Matt Blaze,et al.  Divertible Protocols and Atomic Proxy Cryptography , 1998, EUROCRYPT.

[13]  Dan Boneh,et al.  Evaluating 2-DNF Formulas on Ciphertexts , 2005, TCC.

[14]  Dario Fiore,et al.  Using Linearly-Homomorphic Encryption to Evaluate Degree-2 Functions on Encrypted Data , 2015, CCS.

[15]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[16]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[17]  Nikos Mamoulis,et al.  Secure kNN computation on encrypted databases , 2009, SIGMOD Conference.

[18]  Michael Zohner,et al.  ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation , 2015, NDSS.

[19]  Ming Li,et al.  A tale of two clouds: Computing on data encrypted under multiple keys , 2014, 2014 IEEE Conference on Communications and Network Security.

[20]  Wei Zhang,et al.  Encrypted SVM for Outsourced Data Mining , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[21]  Lakshminarayanan Subramanian,et al.  Two-Party Computation Model for Privacy-Preserving Queries over Distributed Databases , 2009, NDSS.

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[24]  Stephen Tyree,et al.  Parallel Support Vector Machines in Practice , 2014, ArXiv.

[25]  Tamir Tassa,et al.  Oblivious evaluation of multivariate polynomials , 2013, J. Math. Cryptol..

[26]  Ming Li,et al.  Computing encrypted cloud data efficiently under multiple keys , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[27]  David G. Covell,et al.  Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia , 2015, PloS one.

[28]  Martin Jaggi,et al.  An Equivalence between the Lasso and Support Vector Machines , 2013, ArXiv.

[29]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[30]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[31]  Zekeriya Erkin,et al.  Generating Private Recommendations Efficiently Using Homomorphic Encryption and Data Packing , 2012, IEEE Transactions on Information Forensics and Security.

[32]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[33]  Torben P. Pedersen Non-Interactive and Information-Theoretic Secure Verifiable Secret Sharing , 1991, CRYPTO.

[34]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[35]  Yixin Chen,et al.  A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing , 2014, AAAI.

[36]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[37]  M. Duffy,et al.  A personalized approach to cancer treatment: how biomarkers can help. , 2008, Clinical chemistry.

[38]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[39]  Marten van Dijk,et al.  On the Impossibility of Cryptography Alone for Privacy-Preserving Cloud Computing , 2010, HotSec.

[40]  K. Hao,et al.  Bayesian method to predict individual SNP genotypes from gene expression data , 2012, Nature Genetics.

[41]  Tadayoshi Fushiki,et al.  Estimation of prediction error by using K-fold cross-validation , 2011, Stat. Comput..

[42]  Yucel Saygin,et al.  Secret charing vs. encryption-based techniques for privacy preserving data mining , 2007 .