Privacy-Preserving Data Mining - Models and Algorithms

Advances in hardware technology have increased the capability to store and record personal data about consumers and individuals, causing concerns that personal data may be used for a variety of intrusive or malicious purposes. Privacy-Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, randomization and perturbation-based techniques. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions. Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.

[1]  GehrkeJohannes,et al.  Privacy preserving mining of association rules , 2004 .

[2]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[3]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[4]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[5]  G. Stewart The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators , 1980 .

[6]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[7]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[9]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[10]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[11]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[12]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[13]  Wenliang Du,et al.  SVD-based collaborative filtering with privacy , 2005, SAC '05.

[14]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[15]  Yücel Saygin,et al.  Secure Association Rule Sharing , 2004, PAKDD.

[16]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[17]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[18]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[19]  Joachim Biskup,et al.  Controlled Query Evaluation for Known Policies by Combining Lying and Refusal , 2004, Annals of Mathematics and Artificial Intelligence.

[20]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[21]  Jian Pei,et al.  Maintaining K-Anonymity against Incremental Updates , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[22]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[23]  Gultekin Özsoyoglu,et al.  Auditing for secure statistical databases , 1981, ACM '81.

[24]  Latanya Sweeney AI Technologies to Defeat Identity Theft Vulnerabilities , 2005, AAAI Spring Symposium: AI Technologies for Homeland Security.

[25]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[26]  Francis Y. L. Chin,et al.  Security problems on inference control for SUM, MAX, and MIN queries , 1986, JACM.

[27]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[28]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[29]  Ling Liu,et al.  A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[30]  Mikhail J. Atallah,et al.  A secure protocol for computing dot-products in clustered and distributed environments , 2002, Proceedings International Conference on Parallel Processing.

[31]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[32]  Sushil Jajodia,et al.  Checking for k-Anonymity Violation by Views , 2005, VLDB.

[33]  Arbee L. P. Chen,et al.  Hiding Sensitive Association Rules with Limited Side Effects , 2007, IEEE Transactions on Knowledge and Data Engineering.

[34]  Bradley Malin,et al.  Determining the identifiability of DNA database entries , 2000, AMIA.

[35]  Bradley Malin,et al.  Protecting DNA Sequence Anonymity with Generalization Lattices , 2004 .

[36]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[37]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[38]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[39]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[40]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[41]  Lorenzo Sadun Applied Linear Algebra: The Decoupling Principle , 2000 .

[42]  Philip S. Yu,et al.  On Variable Constraints in Privacy Preserving Data Mining , 2005, SDM.

[43]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[44]  Wei Zhao,et al.  A New Scheme on Privacy Preserving Association Rule Mining , 2004, PKDD.

[45]  Keke Chen,et al.  Towards Attack-Resilient Geometric Data Perturbation , 2007, SDM.

[46]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[47]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[48]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[49]  Philip S. Yu,et al.  An Introduction to Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[50]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[51]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[52]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[53]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[54]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[55]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[56]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[57]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[58]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .

[59]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[60]  Tiefeng Jiang,et al.  How many entries of a typical orthogonal matrix can be approximated by independent normals , 2006 .

[61]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[62]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[63]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[64]  GangopadhyayAryya,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, VLDB 2006.

[65]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.

[66]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[67]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[68]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[69]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[70]  Vitaly Shmatikov,et al.  Information Hiding, Anonymity and Privacy: a Modular Approach , 2004, J. Comput. Secur..

[71]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[72]  Josep Domingo-Ferrer Non-Perturbative Masking Methods , 2009, Encyclopedia of Database Systems.

[73]  Dengguo Feng,et al.  A New k-Anonymous Message Transmission Protocol , 2004, WISA.

[74]  Shouhuai Xu,et al.  k-anonymous secret handshakes with reusable credentials , 2004, CCS '04.

[75]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[76]  Nina Mishra,et al.  Simulatable auditing , 2005, PODS.

[77]  Rebecca N. Wright,et al.  A New Privacy-Preserving Distributed k-Clustering Algorithm , 2006, SDM.

[78]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[79]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[80]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[81]  LeviAlbert,et al.  Privacy preserving clustering on horizontally partitioned data , 2007 .

[82]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[83]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[84]  Jon M. Kleinberg,et al.  Auditing Boolean attributes , 2000, PODS.

[85]  Benjamin C. M. Fung,et al.  Integrating Private Databases for Data Analysis , 2005, ISI.

[86]  Ralph Gross,et al.  Mining Images in Publicly-Available Cameras for Homeland Security , 2005, AAAI Spring Symposium: AI Technologies for Homeland Security.

[87]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[88]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[89]  Philip S. Yu,et al.  On Anonymization of String Data , 2007, SDM.

[90]  Chris Clifton,et al.  Privacy-Preserving Distributed k-Anonymity , 2005, DBSec.

[91]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[92]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[93]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[94]  Philip S. Yu,et al.  Template-based privacy preservation in classification problems , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[95]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[96]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[97]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[98]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[99]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[100]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[101]  Osmar R. Zaïane,et al.  Data Perturbation by Rotation for Privacy-Preserving Clustering , 2004 .

[102]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[103]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[104]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[105]  Steven P. Reiss Security in Databases: A Combinatorial Study , 1979, JACM.

[106]  Kun Liu,et al.  An Attacker's View of Distance Preserving Maps for Privacy Preserving Data Mining , 2006, PKDD.

[107]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[108]  Lei Liu,et al.  Optimal randomization for privacy preserving data mining , 2004, KDD.

[109]  L. Sweeney,et al.  Preserving Privacy by De-identifying Facial Images , 2003 .

[110]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[111]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[112]  Ling Liu,et al.  A Customizable k-Anonymity Model for Protecting Location Privacy , 2004 .

[113]  Rajeev Motwani,et al.  Towards robustness in query auditing , 2006, VLDB.

[114]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[115]  C. Dwork,et al.  On the Utility of Privacy-Preserving Histograms , 2004 .

[116]  Bradley Malin,et al.  Re-identification of DNA through an automated linkage process , 2001, AMIA.

[117]  Xintao Wu,et al.  Deriving Private Information from Arbitrarily Projected Data , 2007, PAKDD.

[118]  S. M. Samuels On the Number of Successes in Independent Trials , 1965 .

[119]  William E. Winkler,et al.  Using Simulated Annealing for k-anonymity , 2002 .

[120]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[121]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[122]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[123]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[124]  Maria E. Orlowska,et al.  A reconstruction-based algorithm for classification rules hiding , 2006, ADC.

[125]  L. Sweeney Replacing personally-identifying information in medical records, the Scrub system. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[126]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[127]  Philip S. Yu,et al.  On Privacy-Preservation of Text and Sparse Binary Data with Sketches , 2007, SDM.

[128]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[129]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[130]  William E. Winkler,et al.  Multiplicative Noise for Masking Continuous Data , 2001 .

[131]  Laks V. S. Lakshmanan,et al.  To do or not to do: the dilemma of disclosing anonymized data , 2005, SIGMOD '05.

[132]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[133]  Sushil Jajodia,et al.  Protecting Privacy Against Location-Based Personal Identification , 2005, Secure Data Management.

[134]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[135]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[136]  Ira S. Moskowitz,et al.  Parsimonious downgrading and decision trees applied to the inference problem , 1998, NSPW '98.

[137]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[138]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[139]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[140]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.