Secure k-ish Nearest Neighbors Classifier

Abstract The k-nearest neighbors (kNN) classifier predicts a class of a query, q, by taking the majority class of its k neighbors in an existing (already classified) database, S. In secure kNN, q and S are owned by two different parties and q is classified without sharing data. In this work we present a classifier based on kNN, that is more efficient to implement with homomorphic encryption (HE). The efficiency of our classifier comes from a relaxation we make to consider κ nearest neighbors for κ ≈k with probability that increases as the statistical distance between Gaussian and the distribution of the distances from q to S decreases. We call our classifier k-ish Nearest Neighbors (k-ish NN). For the implementation we introduce double-blinded coin-toss where the bias and output of the toss are encrypted. We use it to approximate the average and variance of the distances from q to S in a scalable circuit whose depth is independent of |S|. We believe these to be of independent interest. We implemented our classifier in an open source library based on HElib and tested it on a breast tumor database. Our classifier has accuracy and running time comparable to current state of the art (non-HE) MPC solution that have better running time but worse communication complexity. It also has communication complexity similar to naive HE implementation that have worse running time.

[1]  David P. Woodru Sketching as a Tool for Numerical Linear Algebra , 2014 .

[2]  Vinod Vaikuntanathan,et al.  On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption , 2012, STOC '12.

[3]  Panos Kalnis,et al.  Enabling search services on outsourced private spatial data , 2009, The VLDB Journal.

[4]  Dan Feldman,et al.  Secure Search on the Cloud via Coresets and Sketches , 2017, ArXiv.

[5]  Vinod Vaikuntanathan,et al.  Efficient Fully Homomorphic Encryption from (Standard) LWE , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[6]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[7]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[8]  Artur Czumaj,et al.  (1+ Є)-approximation for facility location in data streams , 2013, SODA.

[9]  Berk Sunar,et al.  Low Depth Circuits for Efficient Homomorphic Sorting , 2015, IACR Cryptol. ePrint Arch..

[10]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[11]  Thomas Ristenpart,et al.  Proceedings of the 3rd ACM workshop on Cloud computing security workshop , 2011, CCS 2011.

[12]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[13]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[14]  Wei Jiang,et al.  Secure k-nearest neighbor query over encrypted data in outsourced environments , 2013, 2014 IEEE 30th International Conference on Data Engineering.

[15]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[16]  Berk Sunar,et al.  Blind Web Search: How far are we from a privacy preserving search engine? , 2016, IACR Cryptol. ePrint Arch..

[17]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[18]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[19]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[20]  Berk Sunar,et al.  Depth Optimized Efficient Homomorphic Sorting , 2015, LATINCRYPT.

[21]  Frederik Vercauteren,et al.  University of Birmingham Hardware assisted fully homomorphic function evaluation and encrypted search , 2016 .

[22]  Indranil Sengupta,et al.  Accelerating Sorting of Fully Homomorphic Encrypted Data , 2013, INDOCRYPT.

[23]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[24]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[25]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[26]  Nikos Mamoulis,et al.  Secure kNN computation on encrypted databases , 2009, SIGMOD Conference.

[27]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[28]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[29]  Khalil El-Khatib,et al.  A Secure Database System using Homomorphic Encryption Schemes , 2011, DBKDA 2011.

[30]  Indranil Sengupta,et al.  Searching and Sorting of Fully Homomorphic Encrypted Data on Cloud , 2015, IACR Cryptol. ePrint Arch..

[31]  C M Faddick Health care fraud and abuse: new weapons, new penalties, and new fears for providers created by the Health Insurance Portability and Accountability Act of 1996 ("HIPAA"). , 1997, Annals of health law.

[32]  Vinod Vaikuntanathan,et al.  Cloud-Assisted Multiparty Computation from Fully Homomorphic Encryption , 2011, IACR Cryptol. ePrint Arch..

[33]  Jianliang Xu,et al.  Processing private queries over untrusted data cloud through privacy homomorphism , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[34]  Tsuyoshi Takagi,et al.  Secure k-NN computation on encrypted cloud data without sharing key with query users , 2013, Cloud Computing '13.

[35]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[36]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[37]  Vladimir Braverman,et al.  Clustering High Dimensional Dynamic Data Streams , 2017, ICML.

[38]  Robert Okonigene,et al.  Impacts of Latency on Throughput of a Corporate Computer Network , 2010, MSV.

[39]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[40]  Xiang Cheng,et al.  Enabling secure and efficient kNN query processing over encrypted spatial data in the cloud , 2015, Secur. Commun. Networks.

[41]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[42]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[43]  Larry J. Stockmeyer,et al.  On the Number of Nonscalar Multiplications Necessary to Evaluate Polynomials , 1973, SIAM J. Comput..

[44]  Feifei Li,et al.  Secure nearest neighbor revisited , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[45]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[46]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.