Privacy-Preserving Classification of Customer Data without Loss of Accuracy

Privacy has become an increasingly important issue in data mining. In this paper, we consider a scenario in which a data miner surveys a large number of customers to learn classiflcation rules on their data, while the sensitive attributes of these customers need to be protected. Solutions have been proposed to address this problem using randomization techniques. Such solutions exhibit a tradeofi of accuracy and privacy: the more each customer’s private information is protected, the less accurate result the miner obtains; conversely, the more accurate the result, the less privacy for the customers. In this paper, we propose a simple cryptographic approach that is e‐cient even in a many-customer setting, provides strong privacy for each customer, and does not lose any accuracy as the cost of privacy. Our key technical contribution is a privacy-preserving method that allows a data miner to compute frequencies of values or tuples of values in the customers’ data, without revealing the privacy-sensitive part of the data. Unlike general-purpose cryptographic protocols, this method requires no interaction between customers, and each customer only needs to send a single ∞ow of communication to the data miner. However, we are still able to ensure that nothing about the sensitive data beyond the desired frequencies is revealed to the data miner. To illustrate the power of our approach, we use our frequency mining computation to obtain a privacypreserving naive Bayes classifler learning algorithm. Initial experimental results demonstrate the practical e‐ciency of our solution. We also suggest some other applications of privacy-preserving frequency mining.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[3]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[4]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[5]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[6]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[7]  Josh Benaloh,et al.  Receipt-free secret-ballot elections (extended abstract) , 1994, STOC '94.

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  염흥렬,et al.  [서평]「Applied Cryptography」 , 1997 .

[10]  Yiannis Tsiounis,et al.  On the Security of ElGamal Based Encryption , 1998, Public Key Cryptography.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Kazue Sako,et al.  Efficient Receipt-Free Voting Based on Homomorphic Encryption , 2000, EUROCRYPT.

[13]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[14]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[15]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[16]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[17]  Murat Kantarcioglu,et al.  An architecture for privacy-preserving mining of client information , 2002 .

[18]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[19]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[20]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[21]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[22]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[23]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[24]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[25]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[26]  Rebecca N. Wright,et al.  Experimental Analysis of Privacy-Preserving Statistics Computation , 2004, Secure Data Management.

[27]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[28]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[29]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Markus Jakobsson,et al.  Cryptographic Randomized Response Techniques , 2003, Public Key Cryptography.

[31]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .