Privacy Preserving Nearest Neighbor Search

Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this work we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification

[1]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[2]  A. Yao How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[3]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[4]  Ling Liu,et al.  Mining Multiple Private Databases using a Privacy Preserving kNN Classifier , 2007 .

[5]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[6]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[7]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[10]  Mikhail J. Atallah,et al.  Private collaborative forecasting and benchmarking , 2004, WPES '04.

[11]  Marc Fischlin,et al.  A Cost-Effective Pay-Per-Multiplication Comparison Method for Millionaires , 2001, CT-RSA.

[12]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[13]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[14]  Matthew K. Franklin,et al.  A survey of key evolving cryptosystems , 2006, Int. J. Secur. Networks.

[15]  Mostafa H. Ammar,et al.  Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme , 2004, Comput. Networks.

[16]  Eike Kiltz,et al.  Unconditionally Secure Constant Round Multi-Party Computation for Equality, Comparison, Bits and Exponentiation , 2006, IACR Cryptol. ePrint Arch..

[17]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[18]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[19]  Stan Matwin,et al.  Privacy Preserving K-nearest Neighbor Classification , 2005, Int. J. Netw. Secur..

[20]  Jaideep Srivastava,et al.  Feature Mining for Prediction of Degree of Liver Fibrosis , 2005, AMIA.

[21]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[22]  David P. Woodruff,et al.  Polylogarithmic Private Approximations and Efficient Matching , 2006, TCC.

[23]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ananth Grama,et al.  An efficient protocol for Yao's millionaires' problem , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[25]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[26]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[27]  R. Jarvis,et al.  ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[28]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[29]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[30]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[31]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[32]  Chris Clifton,et al.  Privately Computing a Distributed k-nn Classifier , 2004, PKDD.

[33]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[34]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[35]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.

[36]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[37]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[38]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[39]  Sven Laur,et al.  On Private Similarity Search Protocols , 2004 .

[40]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[41]  Chris Clifton,et al.  Privacy-preserving outlier detection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[42]  Wenjun Luo,et al.  A study of secure multi-party elementary function computation protocols , 2004, InfoSecu '04.

[43]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[44]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[45]  Levent Ertoz,et al.  A New Shared Nearest Neighbor Clustering Algorithm and its Applications , 2002 .

[46]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[47]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[48]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[49]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[50]  Stanley R. M. Oliveira,et al.  Privacy-Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation , 2004 .

[51]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.