Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data

This work proposes a protocol for performing linear regression over a dataset that is distributed over multiple parties. The parties will jointly compute a linear regression model without actually sharing their own private datasets. We provide security definitions, a protocol, and security proofs. Our solution is information-theoretically secure and is based on the assumption that a Trusted Initializer pre-distributes random, correlated data to the parties during a setup phase. The actual computation happens later on, during an online phase, and does not involve the trusted initializer. Our online protocol is orders of magnitude faster than previous solutions. In the case where a trusted initializer is not available, we propose a computationally secure two-party protocol based on additive homomorphic encryption that substitutes the trusted initializer. In this case, the online phase remains the same and the offline phase is computationally heavy. However, because the computations in the offline phase happen over random data, the overall problem is embarrassingly parallelizable, making it faster than existing solutions for processors with an appropriate number of cores.

[1]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[2]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[3]  Nicholas J. Higham,et al.  A Schur-Newton Method for the Matrix \lowercase{\boldmathp}th Root and its Inverse , 2006, SIAM J. Matrix Anal. Appl..

[4]  Eike Kiltz,et al.  Secure Computation of the Mean and Related Statistics , 2005, IACR Cryptol. ePrint Arch..

[5]  Marcel Keller,et al.  Practical Covertly Secure MPC for Dishonest Majority - Or: Breaking the SPDZ Limits , 2013, ESORICS.

[6]  Donald Beaver,et al.  Commodity-based cryptography (extended abstract) , 1997, STOC '97.

[7]  Anat Paskin-Cherniavsky,et al.  On the Power of Correlated Randomness in Secure Computation , 2013, TCC.

[8]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[9]  Ivan Damgård,et al.  Semi-Homomorphic Encryption and Multiparty Computation , 2011, IACR Cryptol. ePrint Arch..

[10]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[11]  Donald Beaver,et al.  Server-assisted cryptography , 1998, NSPW '98.

[12]  Ran Canetti,et al.  Universally composable security: a new paradigm for cryptographic protocols , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[13]  InitializerRonald L. RivestLaboratory Unconditionally Secure Commitment and Oblivious Transfer Schemes Using Private Channels and a Trusted Initializer , 1999 .

[14]  Octavian Catrina,et al.  Secure Computation with Fixed-Point Numbers , 2010, Financial Cryptography.

[15]  Joan Feigenbaum,et al.  Secure Multiparty Computation of Approximations , 2001, ICALP.

[16]  Yihua Zhang,et al.  Secure Computation on Floating Point Numbers , 2013, NDSS.

[17]  Jeroen van de Graaf,et al.  A Two-Party Protocol with Trusted Initializer for Computing the Inner Product , 2010, WISA.

[18]  Goichiro Hanaoka,et al.  Universally Composable and Statistically Secure Verifiable Secret Sharing Scheme Based on Pre-Distributed Data , 2009, IACR Cryptol. ePrint Arch..

[19]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[20]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[21]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[22]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[23]  Goichiro Hanaoka,et al.  Unconditionally Non-interactive Verifiable Secret Sharing Secure against Faulty Majorities in the Commodity Based Model , 2004, ACNS.

[24]  S. Fienberg,et al.  Secure multiple linear regression based on homomorphic encryption , 2011 .

[25]  NICHOLAS J. HIGHAM,et al.  A SCHUR–NEWTON METHOD FOR THE MATRIX PTH ROOT AND ITS INVERSE∗ , 2005 .

[26]  E. Kushilevitz Foundations of Cryptography Foundations of Cryptography , 2014 .

[27]  Goichiro Hanaoka,et al.  Information-theoretically secure oblivious polynomial evaluation in the commodity-based model , 2014, International Journal of Information Security.

[28]  Barbara Masucci,et al.  Constructions and Bounds for Unconditionally Secure Non-Interactive Commitment Schemes , 2002, Des. Codes Cryptogr..

[29]  Donald Beaver,et al.  Precomputing Oblivious Transfer , 1995, CRYPTO.

[30]  Goichiro Hanaoka,et al.  Unconditionally Secure Homomorphic Pre-distributed Bit Commitment and Secure Two-Party Computations , 2003, ISC.

[31]  Donald Beaver,et al.  One-Time Tables for Two-Party Computation , 1998, COCOON.

[32]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .