Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

Abstract We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

[1]  Martin H. Weik A SECOND SURVEY OF DOMESTIC ELECTRONIC DIGITAL COMPUTING SYSTEMS.. , 1957 .

[2]  M. H. Weik,et al.  A Third Survey of Domestic Electronic Digital Computing Systems, Report No. 1115 , 1961 .

[3]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[4]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[5]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[6]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[7]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[8]  Moni Naor,et al.  Privacy preserving auctions and mechanism design , 1999, EC '99.

[9]  Niv Gilboa,et al.  Two Party RSA Key Generation , 1999, CRYPTO.

[10]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[11]  Wenliang Du,et al.  Protocols for Secure Remote Database Access with Approximate Matching , 2001, E-Commerce Security and Privacy.

[12]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[13]  Wenliang Du,et al.  Privacy-preserving cooperative scientific computations , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[14]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[15]  Alok Baveja,et al.  Computing , Artificial Intelligence and Information Technology A data-driven software tool for enabling cooperative information sharing among police departments , 2002 .

[16]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[17]  Xiaodong Lin,et al.  Regression on Distributed Databases via Secure Multi-Party Computation , 2004, DG.O.

[18]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[19]  Tomás Lang,et al.  Digit-Serial Arithmetic , 2004 .

[20]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[21]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[22]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[23]  G. Meurant The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations , 2006 .

[24]  Matthew K. Franklin,et al.  Efficiency Tradeoffs for Malicious Two-Party Computation , 2006, Public Key Cryptography.

[25]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[26]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[27]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[28]  Vladimir Kolesnikov,et al.  Improved Garbled Circuit: Free XOR Gates and Applications , 2008, ICALP.

[29]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.

[30]  Yehuda Lindell,et al.  A Proof of Security of Yao’s Protocol for Two-Party Computation , 2009, Journal of Cryptology.

[31]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[32]  Yehuda Lindell,et al.  Security Against Covert Adversaries: Efficient Protocols for Realistic Adversaries , 2007, Journal of Cryptology.

[33]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[34]  Paulo Cortez,et al.  Using Data Mining for Wine Quality Assessment , 2009, Discovery Science.

[35]  Yehuda Lindell,et al.  Secure Two-Party Computation via Cut-and-Choose Oblivious Transfer , 2010, IACR Cryptol. ePrint Arch..

[36]  Ahmad-Reza Sadeghi,et al.  TASTY: tool for automating secure two-party computations , 2010, CCS '10.

[37]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[38]  Abhi Shelat,et al.  Efficient Secure Computation with Garbled Circuits , 2011, ICISS.

[39]  Claudio Orlandi,et al.  A New Approach to Practical Active-Secure Two-Party Computation , 2012, IACR Cryptol. ePrint Arch..

[40]  S. Fienberg,et al.  Secure multiple linear regression based on homomorphic encryption , 2011 .

[41]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[42]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[43]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[44]  Krisztian Buza,et al.  Feedback Prediction for Blogs , 2012, GfKl.

[45]  Jonathan Katz,et al.  Secure two-party computation in sublinear (amortized) time , 2012, CCS.

[46]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[47]  Yehuda Lindell,et al.  More efficient oblivious transfer and extensions for faster secure computation , 2013, CCS.

[48]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[49]  Mihir Bellare,et al.  Efficient Garbling from a Fixed-Key Blockcipher , 2013, 2013 IEEE Symposium on Security and Privacy.

[50]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[51]  João Gama,et al.  Bike-Sharing Dataset , 2013 .

[52]  Marina Blanton,et al.  Data-oblivious graph algorithms for secure computation and outsourcing , 2013, ASIA CCS '13.

[53]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[54]  Marcel Keller,et al.  Efficient, Oblivious Data Structures for MPC , 2014, IACR Cryptol. ePrint Arch..

[55]  Sander Siim,et al.  Combining Secret Sharing and Garbled Circuits for Efficient Private IEEE 754 Floating-Point Computations , 2015, Financial Cryptography Workshops.

[56]  Michael Zohner,et al.  ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation , 2015, NDSS.

[57]  David Evans,et al.  Obliv-C: A Language for Extensible Data-Oblivious Computation , 2015, IACR Cryptol. ePrint Arch..

[58]  Yehuda Lindell Fast Cut-and-Choose-Based Protocols for Malicious and Covert Adversaries , 2015, Journal of Cryptology.

[59]  R. Huerta,et al.  Data set from chemical sensor array exposed to turbulent gas mixtures , 2015, Data in brief.

[60]  Sadique Sheik,et al.  Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring , 2015 .

[61]  Ahmad-Reza Sadeghi,et al.  TinyGarble: Highly Compressed and Scalable Sequential Garbled Circuits , 2015, 2015 IEEE Symposium on Security and Privacy.

[62]  David Evans,et al.  Two Halves Make a Whole - Reducing Data Transfer in Garbled Circuits Using Half Gates , 2015, EUROCRYPT.

[63]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[64]  Yehuda Lindell,et al.  Efficient Constant Round Multi-Party Computation Combining BMR and SPDZ , 2015, IACR Cryptol. ePrint Arch..

[65]  Ahmad-Reza Sadeghi,et al.  Automated Synthesis of Optimized Circuits for Secure Computation , 2015, CCS.

[66]  Kartik Nayak,et al.  ObliVM: A Programming Framework for Secure Computation , 2015, 2015 IEEE Symposium on Security and Privacy.

[67]  Martine De Cock,et al.  Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data , 2015, AISec@CCS.

[68]  Alex J. Malozemoff,et al.  Faster Two-Party Computation Secure Against Malicious Adversaries in the Single-Execution Setting , 2016, IACR Cryptol. ePrint Arch..

[69]  Peeter Laud,et al.  Secure Multiparty Sorting Protocols with Covert Privacy , 2016, NordSec.

[70]  Yehuda Lindell,et al.  How To Simulate It - A Tutorial on the Simulation Proof Technique , 2016, IACR Cryptol. ePrint Arch..

[71]  Marcel Keller,et al.  MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer , 2016, IACR Cryptol. ePrint Arch..

[72]  Berry Schoenmakers,et al.  Certificate Validation in Secure Computation and Its Use in Verifiable Linear Programming , 2016, AFRICACRYPT.

[73]  Alex J. Malozemoff,et al.  Faster Secure Two-Party Computation in the Single-Execution Setting , 2017, EUROCRYPT.

[74]  Angela Wiesberg Machine learning on encrypted data , 2018 .

[75]  Dan Bogdanov,et al.  Rmind: A Tool for Cryptographically Secure Statistical Analysis , 2016, IEEE Transactions on Dependable and Secure Computing.

[76]  Silvio Micali,et al.  A Completeness Theorem for Protocols with Honest Majority , 1987, STOC 1987.

[77]  Yehuda Lindell,et al.  Efficient Constant-Round Multi-party Computation Combining BMR and SPDZ , 2019, Journal of Cryptology.