Privacy-preserving training of tree ensembles over continuous data

Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard “in the clear” algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an optimal cut-point in the range of feature values in each node. Sorting is an expensive operation in MPC, hence finding secure protocols that avoid such an expensive step is a relevant problem in privacy-preserving machine learning. In this paper we propose three more efficient alternatives for secure training of decision tree based models on data with continuous features, namely: (1) secure discretization of the data, followed by secure training of a decision tree over the discretized data; (2) secure discretization of the data, followed by secure training of a random forest over the discretized data; and (3) secure training of extremely randomized trees (“extratrees”) on the original data. Approaches (2) and (3) both involve randomizing feature choices. In addition, in approach (3) cutpoints are chosen randomly as well, thereby alleviating the need to sort or to discretize the data up front. We implemented all proposed solutions in the semi-honest setting with additive secret sharing based MPC. In addition to mathematically proving that all proposed approaches are correct and secure, we experimentally evaluated and compared them in terms of classification accuracy and runtime. We privately train tree ensembles over data sets with 1000s of instances or features in a few minutes, with accuracies that are at par with those obtained in the clear. This makes our solution orders of magnitude more efficient than the existing approaches, which are based on oblivious sorting.

[1]  Paulo S. L. M. Barreto,et al.  A Framework for Efficient Adaptively Secure Composable Oblivious Transfer in the ROM , 2017, IACR Cryptol. ePrint Arch..

[2]  Mark Tygert,et al.  Secure multiparty computations in floating-point arithmetic , 2020, Information and Inference: A Journal of the IMA.

[3]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[4]  Nicolas Gama,et al.  XORBoost: Tree Boosting in the Multiparty Computation Setting , 2021, IACR Cryptol. ePrint Arch..

[5]  Daniel Escudero,et al.  Secure training of decision trees with continuous attributes , 2020, IACR Cryptol. ePrint Arch..

[6]  Ran Canetti,et al.  Universally composable protocols with relaxed set-up assumptions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[7]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[8]  Goichiro Hanaoka,et al.  Information-theoretically secure oblivious polynomial evaluation in the commodity-based model , 2014, International Journal of Information Security.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Anat Paskin-Cherniavsky,et al.  On the Power of Correlated Randomness in Secure Computation , 2013, TCC.

[11]  Jeroen van de Graaf,et al.  Unconditionally Secure, Universally Composable Privacy Preserving Linear Algebra , 2016, IEEE Transactions on Information Forensics and Security.

[12]  Rafael Dowsley,et al.  Cryptography Based on Correlated Data: Foundations and Practice , 2016 .

[13]  Yehuda Lindell,et al.  Universally composable two-party and multi-party secure computation , 2002, STOC '02.

[14]  Jonathan Katz,et al.  Universally Composable Multi-party Computation Using Tamper-Proof Hardware , 2007, EUROCRYPT.

[15]  Ivan Damgård,et al.  Secure Multiparty Computation and Secret Sharing , 2015 .

[16]  Hui Shao,et al.  Privacy Preserving C4.5 Algorithm over Vertically Distributed Datasets , 2009, 2009 International Conference on Networks Security, Wireless Communications and Trusted Computing.

[17]  Goichiro Hanaoka,et al.  Universally Composable and Statistically Secure Verifiable Secret Sharing Scheme Based on Pre-Distributed Data , 2009, IACR Cryptol. ePrint Arch..

[18]  Jeroen van de Graaf,et al.  A Two-Party Protocol with Trusted Initializer for Computing the Inner Product , 2010, WISA.

[19]  Gopal Behera,et al.  Privacy preserving C4.5 using Gini index , 2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science.

[20]  Anderson C. A. Nascimento,et al.  Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation , 2019, IEEE Transactions on Dependable and Secure Computing.

[21]  Anderson C. A. Nascimento,et al.  On Possibility of Universally Composable Commitments Based on Noisy Channels , 2008, Anais do VIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg 2008).

[22]  Jörn Müller-Quade,et al.  Universally Composable Commitments Using Random Oracles , 2004, TCC.

[23]  Dongrui Wu,et al.  Protecting Privacy of Users in Brain-Computer Interface Applications , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[24]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.

[25]  Ping Chen,et al.  Practical Secure Decision Tree Learning in a Teletreatment Application , 2014, Financial Cryptography.

[26]  Anderson C. A. Nascimento,et al.  On the Composability of Statistically Secure Bit Commitments , 2013, IACR Cryptol. ePrint Arch..

[27]  Ali Miri,et al.  Privacy preserving ID3 using Gini Index over horizontally partitioned data , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Hong Shen,et al.  Privacy Preserving C4.5 Algorithm Over Horizontally Partitioned Data , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing (GCC'06).

[30]  Rafael Dowsley,et al.  Weakening the Isolation Assumption of Tamper-Proof Hardware Tokens , 2015, ICITS.

[31]  Brent Waters,et al.  A Framework for Efficient and Composable Oblivious Transfer , 2008, CRYPTO.

[32]  Martine De Cock,et al.  Privacy-Preserving Scoring of Tree Ensembles: A Novel Framework for AI in Healthcare , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[33]  Ran Canetti,et al.  Universally Composable Commitments , 2001, CRYPTO.

[34]  Ran Canetti,et al.  Universally composable security: a new paradigm for cryptographic protocols , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[35]  InitializerRonald L. RivestLaboratory Unconditionally Secure Commitment and Oblivious Transfer Schemes Using Private Channels and a Trusted Initializer , 1999 .

[36]  Jacob C. N. Schuldt,et al.  A Taxonomy of Secure Two-Party Comparison Protocols and Efficient Constructions , 2017, 2017 15th Annual Conference on Privacy, Security and Trust (PST).

[37]  Kazuo Ohta,et al.  Multiparty Computation for Interval, Equality, and Comparison Without Bit-Decomposition Protocol , 2007, Public Key Cryptography.

[38]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[39]  Sameer Wagh,et al.  SecureNN: 3-Party Secure Computation for Neural Network Training , 2019, Proc. Priv. Enhancing Technol..

[40]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[41]  Martine De Cock,et al.  Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data , 2015, AISec@CCS.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[44]  Martine De Cock,et al.  High performance logistic regression for privacy-preserving genome analysis , 2020, BMC Medical Genomics.

[45]  Martine De Cock,et al.  Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation: An Application to Hate-Speech Detection , 2019, IACR Cryptol. ePrint Arch..

[46]  Anderson C. A. Nascimento,et al.  Efficient Unconditionally Secure Comparison and Privacy Preserving Machine Learning Classification Protocols , 2015, ProvSec.