A Fully Private Pipeline for Deep Learning on Electronic Health Records

We introduce an end-to-end private deep learning framework, applied to the task of predicting 30-day readmission from electronic health records. By using differential privacy during training and homomorphic encryption during inference, we demonstrate that our proposed pipeline could maintain high performance while providing robust privacy guarantees against information leak from data transmission or attacks against the model. We also explore several techniques to address the privacy-utility trade-off in deploying neural networks with privacy mechanisms, improving the accuracy of differentially-private training and the computation cost of encrypted operations using ideas from both machine learning and cryptography.

[1]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[3]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[5]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[6]  Hassan Takabi,et al.  CryptoDL: Deep Neural Networks over Encrypted Data , 2017, ArXiv.

[7]  R. Hardwarsing Stochastic Gradient Descent with Differentially Private Updates , 2018 .

[8]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[9]  Hao Chen,et al.  Simple Encrypted Arithmetic Library v2.3.0 , 2017 .

[10]  Somesh Jha,et al.  The Unintended Consequences of Overfitting: Training Data Inference Attacks , 2017, ArXiv.

[11]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[12]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[13]  Constance Morel,et al.  Privacy-Preserving Classification on Deep Neural Network , 2017, IACR Cryptol. ePrint Arch..

[14]  Vinod Vaikuntanathan,et al.  Efficient Fully Homomorphic Encryption from (Standard) LWE , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[15]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[16]  Quynh N. Nguyen,et al.  Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods , 2016, NIPS.

[17]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[18]  Julien Eynard,et al.  A Full RNS Variant of FV Like Somewhat Homomorphic Encryption Schemes , 2016, SAC.

[19]  Sungroh Yoon,et al.  Security and Privacy Issues in Deep Learning , 2018, ArXiv.

[20]  Úlfar Erlingsson,et al.  The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[21]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[22]  Gillian Dobbie,et al.  A Review of Privacy and Consent Management in Healthcare: A Focus on Emerging Data Sources , 2017, 2017 IEEE 13th International Conference on e-Science (e-Science).

[23]  Malladihalli S. Bhuvan,et al.  Identifying Diabetic Patients with High Risk of Readmission , 2016, ArXiv.

[24]  David Harvey,et al.  Faster arithmetic for number-theoretic transforms , 2012, J. Symb. Comput..

[25]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[26]  Arnaud Tisserand,et al.  Computing machine-efficient polynomial approximations , 2006, TOMS.

[27]  Khaled El Emam,et al.  Practicing Differential Privacy in Health Care: A Review , 2013, Trans. Data Priv..

[28]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[29]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[30]  Anupam Shukla,et al.  Recurrent Neural Networks with Non-Sequential Data to Predict Hospital Readmission of Diabetic Patients , 2017, ICCBB.

[31]  Zachariah Zhang,et al.  Deep EHR: Chronic Disease Prediction Using Medical Notes , 2018, MLHC.

[32]  Ian J. Goodfellow,et al.  A ug 2 01 7 On the Protection of Private Information in Machine Learning Systems : Two Recent Approaches ( Invited Paper ) , 2018 .

[33]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[34]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[35]  Lior Rokach,et al.  Publishing Differentially Private Medical Events Data , 2016, CD-ARES.

[36]  Khaled El Emam,et al.  The application of differential privacy to health data , 2012, EDBT-ICDT '12.

[37]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[38]  Nataliya Sokolovska,et al.  Deep Learning for Metagenomic Data: using 2D Embeddings and Convolutional Neural Networks , 2017, ArXiv.