Distributed Privacy Preserving Classification Based on Local Cluster Identifiers

This paper addresses privacy preserving classification for vertically partitioned datasets. We present an approach based on information hiding that is similar to the basic idea of microaggregation. We use a local clustering to mask the dataset of each party and replace the original attributes by cluster identifiers. That way, the masked datasets can be integrated and used to train a classifier without further privacy restrictions. We apply our approach to four standard machine learning datasets and present the results.

[1]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[2]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[3]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[4]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[5]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[6]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[7]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Sushil Jajodia,et al.  Preserving Privacy in On-Line Analytical Processing (OLAP) , 2007, Advances in Information Security.

[10]  Sheng Zhong,et al.  Privacy-Preserving Backpropagation Neural Network Learning , 2009, IEEE Transactions on Neural Networks.

[11]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[15]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[16]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[17]  Chris Clifton,et al.  Privacy-preserving data mining: why, how, and when , 2004, IEEE Security & Privacy Magazine.

[18]  Nico Schlitter,et al.  A Protocol for Privacy Preserving Neural Network Learning on Horizontally Partitioned Data , 2008 .

[19]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Javier Herranz,et al.  How to Group Attributes in Multivariate Microaggregation , 2008, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[24]  B. John Oommen,et al.  A survey on statistical disclosure control and micro‐aggregation techniques for secure statistical databases , 2010, Softw. Pract. Exp..