Privacy-preserving boosting

We describe two algorithms, BiBoost (Bipartite Boosting) and MultBoost (Multiparty Boosting), that allow two or more participants to construct a boosting classifier without explicitly sharing their data sets. We analyze both the computational and the security aspects of the algorithms. The algorithms inherit the excellent generalization performance of AdaBoost. Experiments indicate that the algorithms are better than AdaBoost executed separately by the participants, and that, independently of the number of participants, they perform close to AdaBoost executed using the entire data set.

[1]  BertinoElisa,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms , 2005 .

[2]  Salvatore J. Stolfo,et al.  The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[5]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  Chris Clifton Privacy Preserving Distributed Data Mining , 2001 .

[10]  J. Friedman Stochastic gradient boosting , 2002 .

[11]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[14]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[15]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[16]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[18]  Jugal K. Kalita,et al.  Efficient handling of high-dimensional feature spaces by randomized classifier ensembles , 2002, KDD.

[19]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[20]  Chris Clifton,et al.  Privately Computing a Distributed k-nn Classifier , 2004, PKDD.

[21]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  Ira S. Moskowitz,et al.  An Integrated Framework for Database Privacy Protection , 2000, DBSec.

[24]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[25]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[26]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[27]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[28]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[29]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[30]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[31]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  H. Vincent Poor,et al.  Consistency in models for distributed learning under communication constraints , 2005, IEEE Transactions on Information Theory.

[34]  Tal Rabin,et al.  Verifiable secret sharing and multiparty protocols with honest majority , 1989, STOC '89.

[35]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[36]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[37]  David Chaum,et al.  Multiparty Computations Ensuring Privacy of Each Party's Input and Correctness of the Result , 1987, CRYPTO.

[38]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[39]  C. Andrew Neff,et al.  A verifiable secret shuffle and its application to e-voting , 2001, CCS '01.

[40]  David Chaum,et al.  Electronic Mail, Return Address, and Digital Pseudonyms , 1981 .

[41]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[42]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[43]  Stewart S. Miller Parallel Databases , 2001, High-Performance Web Databases.

[44]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[45]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[46]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[47]  Zoran Obradovic,et al.  Boosting Algorithms for Parallel and Distributed Learning , 2022 .

[48]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[49]  Kazue Sako,et al.  An Efficient Scheme for Proving a Shuffle , 2001, CRYPTO.

[50]  Balázs Kégl Robust Regression by Boosting the Median , 2003, COLT.

[51]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[52]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[53]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[54]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[55]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[56]  Chi-Jen Lu,et al.  Oblivious polynomial evaluation and oblivious neural learning , 2001, Theor. Comput. Sci..

[57]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[58]  Andrew Chi-Chih Yao,et al.  How to Generate and Exchange Secrets (Extended Abstract) , 1986, FOCS.

[59]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.