Efficient Privacy-Preserving k-Nearest Neighbor Search

We give efficient protocols for secure and private k-nearest neighbor (k-NN) search, when the data is distributed between two parties who want to cooperatively compute the answers without revealing to each other their private data. Our protocol for the single-step k-NN search is provably secure and has linear computation and communication complexity. Previous work on this problem had a quadratic complexity, and also leaked information about the parties' inputs. We adapt our techniquesto also solve the general multi-step k-NN search, and describe a specific embodiment of it for the case of sequence data. The protocols and correctness proofs can be extended to suit other privacy-preserving data mining tasks, such as classification and outlier detection.

[1]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[2]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[3]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[4]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[5]  Chris Clifton,et al.  Privacy-preserving outlier detection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[7]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[8]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[9]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[10]  I. Damglurd Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation , 2006 .

[11]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[12]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[13]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[14]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[15]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[16]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[17]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[18]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[19]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Wenliang Du,et al.  A Hybrid Multi-group Privacy-Preserving Approach for Building Decision Trees , 2007, PAKDD.

[21]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[22]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[23]  Ling Liu,et al.  Mining multiple private databases using a kNN classifier , 2007, SAC '07.

[24]  Chris Clifton,et al.  Privately Computing a Distributed k-nn Classifier , 2004, PKDD.

[25]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[26]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[27]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[28]  Vipin Kumar,et al.  Privacy Preserving Nearest Neighbor Search , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[29]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[30]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[31]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[32]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.