Privacy-preserving ridge regression on distributed data

Abstract Ridge regression is a statistical method for modeling a linear relationship between a dependent variable and some explanatory values. It is a building-block that plays a major role in many learning algorithms such as recommendation systems. However, in many applications such as e-health, explanatory values contains private information owned by different patients that are not willing to share them, unless data privacy is guaranteed. In this paper, we propose a protocol for conducting privacy-preserving ridge regression (PPRR) over high-dimensional data. In our protocol, each user submits its data in an encrypted form to an evaluator and the evaluator computes a linear model of all users’ data without learning their contents. The core encryption method is equipped with homomorphic properties to enable the evaluator to perform ridge regression over encrypted data. We implement our protocol and demonstrate that it is suitable for dealing with high-dimensional data distributed among millions of users. We also compare our protocol with the state-of-the-art solutions in terms of both computation and communication costs. The results show that our protocol outperforms most existing approaches based on secure multi-party computation, garbled circuit, fully homomorphic encryption, secret-sharing, and hybrid methods.

[1]  Sherman S. M. Chow,et al.  Securing Fast Learning! Ridge Regression over Encrypted Big Data , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[2]  S. Fienberg,et al.  Secure multiple linear regression based on homomorphic encryption , 2011 .

[3]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[4]  Josh Benaloh,et al.  Secret Sharing Homomorphisms: Keeping Shares of A Secret Sharing , 1986, CRYPTO.

[5]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[6]  Li Wan,et al.  Privacy-Preserving Gradient-Descent Methods , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Kaoru Kurosawa,et al.  Multi-recipient Public-Key Encryption with Shortened Ciphertext , 2002, Public Key Cryptography.

[8]  Martine De Cock,et al.  Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data , 2015, AISec@CCS.

[9]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[10]  Kai Hwang,et al.  Computer arithmetic: Principles, architecture, and design , 1979 .

[11]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[12]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[13]  Dan Bogdanov,et al.  Rmind: A Tool for Cryptographically Secure Statistical Analysis , 2016, IEEE Transactions on Dependable and Secure Computing.

[14]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[15]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[16]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[17]  Jun Wang,et al.  Privacy and Regression Model Preserved Learning , 2014, AAAI.

[18]  T. Elgamal A public key cryptosystem and a signature scheme based on discrete logarithms , 1984, CRYPTO 1984.

[19]  Mihir Bellare,et al.  Randomness Re-use in Multi-recipient Encryption Schemeas , 2003, Public Key Cryptography.

[20]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[21]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[22]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[23]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Wenliang Du,et al.  Privacy-preserving cooperative scientific computations , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[25]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[26]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[27]  Donald Beaver,et al.  Commodity-based cryptography (extended abstract) , 1997, STOC '97.

[28]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.