Combining Binary Classifiers for a Multiclass Problem with Differential Privacy

Multiclass classification problem is often solved by combing binary classifiers into ensembles. While this is required for inherently binary classifiers, such as SVM, it also provides performance advantages for other classifiers. In this paper, we address the problem of combining binary classifiers into ensembles in the differentially private data publishing framework, where the data privacy is achieved by anonymization. The main idea of this paper is to counter the inevitable loss of data quality due to anonymization of the data by building an ensemble of binary classifiers, and then to use an error-correcting approach to obtain a class decision from this ensemble. We describe the proposed algorithm and present the results of extensive experimentation on synthetic and UC Irvine data. We find that while building ensembles after anonymization leads to no change in classifier accuracy, preparing the data for ensembles prior to anonymization improves accuracy in most of the cases.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[3]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[4]  BlumAvrim,et al.  A learning theory approach to noninteractive database privacy , 2013 .

[5]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[6]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[7]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[8]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[9]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[10]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[11]  Divesh Srivastava,et al.  Accurate and efficient private release of datacubes and contingency tables , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[13]  Charles Elkan,et al.  Differential privacy based on importance weighting , 2013, Machine Learning.

[14]  Bhiksha Raj,et al.  Large Margin Gaussian Mixture Models with Differential Privacy , 2012, IEEE Transactions on Dependable and Secure Computing.

[15]  Sergio Escalera,et al.  On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[18]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[19]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[20]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[21]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.