Enabling Multilevel Trust in Privacy Preserving Data Mining

Privacy Preserving Data Mining (PPDM) addresses the problem of developing accurate models about aggregated data without access to precise information in individual data record. A widely studied perturbation-based PPDM approach introduces random perturbation to individual values to preserve privacy before data are published. Previous solutions of this approach are limited in their tacit assumption of single-level trust on data miners. In this work, we relax this assumption and expand the scope of perturbation-based PPDM to Multilevel Trust (MLT-PPDM). In our setting, the more trusted a data miner is, the less perturbed copy of the data it can access. Under this setting, a malicious data miner may have access to differently perturbed copies of the same data through various means, and may combine these diverse copies to jointly infer additional information about the original data that the data owner does not intend to release. Preventing such diversity attacks is the key challenge of providing MLT-PPDM services. We address this challenge by properly correlating perturbation across copies at different trust levels. We prove that our solution is robust against diversity attacks with respect to our privacy goal. That is, for data miners who have access to an arbitrary collection of the perturbed copies, our solution prevent them from jointly reconstructing the original data more accurately than the best effort using any individual copy in the collection. Our solution allows a data owner to generate perturbed copies of its data for arbitrary trust levels on-demand. This feature offers data owners maximum flexibility.

[1]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[2]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[3]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Joe Brewer,et al.  Kronecker products and matrix calculus in system theory , 1978 .

[5]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[6]  Charu C. Aggarwal Privacy and the Dimensionality Curse , 2008, Privacy-Preserving Data Mining.

[7]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[8]  T. Sargent,et al.  The multivariate normal distribution , 1989 .

[9]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[10]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Donald E. Knuth The Art of Computer Programming 2 / Seminumerical Algorithms , 1971 .

[12]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[13]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[14]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[15]  Yufei Tao,et al.  Output perturbation with query relaxation , 2008, Proc. VLDB Endow..

[16]  Naoki Abe,et al.  Using secure coprocessors for privacy preserving collaborative data mining and analysis , 2006, DaMoN '06.

[17]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[18]  Raymond Chi-Wing Wong,et al.  Privacy-preserving frequent pattern mining across private databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[20]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[21]  Beng Chin Ooi,et al.  Privacy and ownership preserving of outsourced medical data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Wenliang Du,et al.  Inference Analysis in Privacy-Preserving Data Re-publishing , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[24]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Gene H. Golub,et al.  Matrix computations , 1983 .

[26]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[27]  Tad Hogg,et al.  Enhancing privacy and trust in electronic communities , 1999, EC '99.

[28]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[29]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[30]  Amit Sahai,et al.  Secure Multi-Party Computation , 2013 .

[31]  Jian Pei,et al.  Anonymity for continuous data publishing , 2008, EDBT '08.

[32]  K. Shanmugan,et al.  Random Signals: Detection, Estimation and Data Analysis , 1988 .

[33]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[34]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[35]  Minghua Chen,et al.  Optimal Random Perturbation at Multiple Privacy Levels , 2009, Proc. VLDB Endow..

[36]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[37]  Murat Kantarcioglu,et al.  Sovereign Joins , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[38]  Marina Blanton,et al.  Secure Multiparty Computation , 2011, Encyclopedia of Cryptography and Security.

[39]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[40]  Hubert Gatignon Multivariate Normal Distribution , 2010 .

[41]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[42]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[43]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[44]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[45]  Sean W. Smith,et al.  More Efficient Secure Function Evaluation Using Tiny Trusted Third Parties , 2005 .