Attacks on Anonymization-Based Privacy-Preserving: A Survey for Data Mining and Data Publishing

Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.

[1]  Chris Clifton,et al.  Privacy-Preserving Data Mining , 2006, Encyclopedia of Database Systems.

[2]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[3]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[4]  Jeffrey W. Seifert Data Mining and Homeland Security: An Overview , 2008 .

[5]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[6]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[7]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[8]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[9]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[10]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[11]  W. Seifert,et al.  Congressional Research Service Report RL31798 Data Mining and Homeland Security: An Overview , 2009 .

[12]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[13]  Chris Clifton,et al.  Privately Computing a Distributed k-nn Classifier , 2004, PKDD.

[14]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[15]  Johnson,et al.  U.S. Immigration Reform, Homeland Security, and Global Economic Competitiveness in the Aftermath of the September 11, 2001 Terrorist Attacks , 2002 .

[16]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[18]  N. Maheshwarkar,et al.  Privacy Issues for K-anonymity Model , 2011 .

[19]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[20]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[21]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[22]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[24]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[25]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[26]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.

[27]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[28]  Raymond Chi-Wing Wong,et al.  Privacy preserving serial data publishing by role composition , 2008, Proc. VLDB Endow..

[29]  Raymond Chi-Wing Wong,et al.  Privacy-preserving frequent pattern mining across private databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[31]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[32]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[33]  Chris Clifton,et al.  Privacy-preserving distributed data mining on horizontally partitioned data , 2004 .

[34]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[35]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[36]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[37]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[38]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[39]  Yingjie Wu,et al.  K-Anonymity Based on Sensitive Tuples , 2009, 2009 First International Workshop on Database Technology and Applications.

[40]  Bin Li,et al.  A Multi-Dimensional K-Anonymity Model for Hierarchical Data , 2008, 2008 International Symposium on Electronic Commerce and Security.

[41]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[42]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[43]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[44]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.