Privacy-preserving and high-accurate outsourced disease predictor on random forest

Abstract Training data distributed across multiple different institutions is ubiquitous in disease prediction applications. Data collection may involve multiple data sources who are willing to contribute their datasets to train a more precise classifier with a larger training set. Nevertheless, integrating multiple-source datasets will leak sensitive information to untrusted data sources. Hence, it is imperative to protect multiple-source data privacy during the predictor construction process. Besides, since disease diagnosis is strongly associated with health and life, it is vital to guarantee prediction accuracy. In this paper, we propose a privacy-preserving and high-accurate outsourced disease predictor on random forest, called PHPR . PHPR system can perform secure training with medical information which belongs to different data owners, and make accurate prediction. Besides, the original data and computed results in the rational field can be securely processed and stored in cloud without privacy leakage. Specifically, we first design privacy-preserving computation protocols over rational numbers to guarantee computation accuracy and handle outsourced operations on-the-fly. Then, we demonstrate that PHPR system achieves secure disease predictor. Finally, the experimental results using real-world datasets demonstrate that PHPR system not only provides secure disease predictor over ciphertexts, but also maintains the prediction accuracy as the original classifier.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Siu-Ming Yiu,et al.  Multi-key privacy-preserving deep learning in cloud computing , 2017, Future Gener. Comput. Syst..

[3]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[4]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[5]  Jung Hee Cheon,et al.  Search-and-compute on Encrypted Data , 2015, IACR Cryptol. ePrint Arch..

[6]  A. Subasi,et al.  Diagnosis of Chronic Kidney Disease by Using Random Forest , 2017 .

[7]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  Yaozong Gao,et al.  Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest , 2016, Neurobiology of Aging.

[10]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[11]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[12]  Honggang Wang,et al.  Socially Aware Energy-Efficient Mobile Edge Collaboration for Video Distribution , 2017, IEEE Transactions on Multimedia.

[13]  Dan Boneh,et al.  Evaluating 2-DNF Formulas on Ciphertexts , 2005, TCC.

[14]  David Pointcheval,et al.  Threshold Cryptosystems Secure against Chosen-Ciphertext Attacks , 2001, ASIACRYPT.

[15]  Jiann-Shiun Yuan,et al.  Utilizing Transfer Learning and Homomorphic Encryption in a Privacy Preserving and Secure Biometric Recognition System , 2018, Comput..

[16]  Kui Ren,et al.  Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[17]  Jianfeng Ma,et al.  Lightweight Fine-Grained Search Over Encrypted Data in Fog Computing , 2019, IEEE Transactions on Services Computing.

[18]  Yanchun Zhang,et al.  AdaBoost algorithm with random forests for predicting breast cancer survivability , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[19]  Shaoen Wu,et al.  Dynamic Trust Relationships Aware Data Privacy Protection in Mobile Crowd-Sensing , 2018, IEEE Internet of Things Journal.

[20]  Robert H. Deng,et al.  Efficient and Privacy-Preserving Outsourced Calculation of Rational Numbers , 2018, IEEE Transactions on Dependable and Secure Computing.

[21]  Jianfeng Ma,et al.  Practical Attribute-Based Multi-Keyword Search Scheme in Mobile Crowdsourcing , 2018, IEEE Internet of Things Journal.

[22]  Frederik Vercauteren,et al.  Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes , 2010, Public Key Cryptography.

[23]  Yanjiao Chen,et al.  Privacy-Preserving Collaborative Model Learning: The Case of Word Vector Training , 2018, IEEE Transactions on Knowledge and Data Engineering.

[24]  Junjie Yan,et al.  Social Attribute Aware Incentive Mechanism for Device-to-Device Video Distribution , 2017, IEEE Transactions on Multimedia.

[25]  Robert H. Deng,et al.  Privacy-Preserving Outsourced Clinical Decision Support System in the Cloud , 2017, IEEE Transactions on Services Computing.

[26]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[27]  C. Ding Chinese remainder theorem , 1996 .

[28]  Chunxiang Xu,et al.  Statistical learning based fully homomorphic encryption on encrypted data , 2017, Soft Comput..

[29]  Cong Wang,et al.  Dynamic Data Operations with Deduplication in Privacy-Preserving Public Auditing for Secure Cloud Storage , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[30]  Wei Jiang,et al.  k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Emmanuel Bresson,et al.  A Simple Public-Key Cryptosystem with a Double Trapdoor Decryption Mechanism and Its Applications , 2003, ASIACRYPT.

[32]  Chetan Patil,et al.  Heart Disease Diagnosis using Support Vector Machine , 2011 .

[33]  Robert H. Deng,et al.  Hybrid Keyword-Field Search With Efficient Key Management for Industrial Internet of Things , 2019, IEEE Transactions on Industrial Informatics.

[34]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[35]  Basit Shafiq,et al.  A Random Decision Tree Framework for Privacy-Preserving Data Mining , 2014, IEEE Transactions on Dependable and Secure Computing.

[36]  Jacques Stern,et al.  Sharing Decryption in the Context of Voting or Lotteries , 2000, Financial Cryptography.

[37]  Jianfeng Ma,et al.  Privacy-Preserving Patient-Centric Clinical Decision Support System on Naïve Bayesian Classification , 2016, IEEE Journal of Biomedical and Health Informatics.

[38]  Craig Gentry,et al.  Optimizing ORAM and Using It Efficiently for Secure Computation , 2013, Privacy Enhancing Technologies.

[39]  T. Elgamal A public key cryptosystem and a signature scheme based on discrete logarithms , 1984, CRYPTO 1984.

[40]  Jianfeng Ma,et al.  Attribute-Based Keyword Search over Hierarchical Data in Cloud Computing , 2020, IEEE Transactions on Services Computing.

[41]  Qian Wang,et al.  Securing SIFT: Privacy-Preserving Outsourcing Computation of Feature Extractions Over Encrypted Image Data , 2016, IEEE Transactions on Image Processing.

[42]  Gábor Szücs Decision Trees and Random Forest for Privacy-Preserving Data Mining , 2013 .

[43]  Stefan Katzenbeisser,et al.  Efficiently Outsourcing Multiparty Computation Under Multiple Keys , 2013, IEEE Transactions on Information Forensics and Security.

[44]  P. Erdos,et al.  Carmichael's lambda function , 1991 .

[45]  Sidong Liu,et al.  Early diagnosis of Alzheimer's disease with deep learning , 2014, 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI).

[46]  Kim-Kwang Raymond Choo,et al.  Enabling verifiable multiple keywords search over encrypted cloud data , 2018, Inf. Sci..

[47]  Mariana Raykova,et al.  Outsourcing Multi-Party Computation , 2011, IACR Cryptol. ePrint Arch..