Encrypted statistical machine learning: new privacy preserving methods

We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.

[1]  David J. Wu,et al.  Using Homomorphic Encryption for Large Scale Statistical Analysis , 2012 .

[2]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[3]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[4]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[5]  M R Anderlik,et al.  Privacy and confidentiality of genetic information: what rules for the new science? , 2001, Annual review of genomics and human genetics.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[8]  G. Ginsburg,et al.  Medical genomics: Gather and use genetic data in health care , 2014, Nature.

[9]  Steven E. Brenner Be prepared for the big genome leak , 2013, Nature.

[10]  Kamin Whitehouse,et al.  The Data Furnace: Heating Up with Cloud Computing , 2011, HotCloud.

[11]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[12]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[15]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .

[16]  Louis J. M. Aslett,et al.  A review of homomorphic encryption and software tools for encrypted statistical machine learning , 2015, ArXiv.

[17]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[18]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[19]  Joan Scott,et al.  Public opinion about the importance of privacy in biobank research. , 2009, American journal of human genetics.

[20]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[21]  Craig Gentry,et al.  Computing arbitrary functions of encrypted data , 2010, CACM.

[22]  Misha Angrist Genetic privacy needs a more nuanced approach , 2013, Nature.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Chris Peikert,et al.  Better Key Sizes (and Attacks) for LWE-Based Encryption , 2011, CT-RSA.

[25]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[26]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .