Privacy preserving based logistic regression on big data

Abstract Cloud computing has strong computing power and huge storage space. Machine learning algorithm, combining with cloud computing, makes the processing of large-scale data practical. Logistic regression algorithm is a widely popular machine learning-based classification algorithm that can be implemented in cloud. However, data privacy cannot be guaranteed in big data processing as privacy leakage of the training data may occur. In order to prevent the privacy leakage of logistic regression algorithm in the cloud and promote the processing efficiency of training data, this paper offers a Privacy Preserving Logistic Regression Algorithm (PPLRA). The homomorphic encryption is used to encrypt the private data when they are uploaded for training. Moreover, the approximation of the Sigmoid function in logistic regression using Taylor's theorem can support the safe calculation using homomorphic encryption. The Experimental results show that PPLRA has significant effects in data privacy preserving, and is more effective in data processing. Comparison with Non-Privacy Preserving Logistic Regression Algorithm (NPPLRA) shows that the computational efficiency is improved by about 1.2 times.

[1]  T. Elgamal A public key cryptosystem and a signature scheme based on discrete logarithms , 1984, CRYPTO 1984.

[2]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  C. Y. Peng,et al.  An Introduction to Logistic Regression Analysis and Reporting , 2002 .

[4]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[5]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[6]  Hui Li,et al.  Privacy-preserving logistic regression outsourcing in cloud computing , 2013, Int. J. Grid Util. Comput..

[7]  Jung Hee Cheon,et al.  Ensemble Method for Privacy-Preserving Logistic Regression Based on Homomorphic Encryption , 2018, IEEE Access.

[8]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[9]  Frederik Vercauteren,et al.  Privacy-preserving logistic regression training , 2018, BMC Medical Genomics.

[10]  Yuguang Fang,et al.  Privacy-Preserving Machine Learning Algorithms for Big Data Systems , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[11]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[12]  Dafang Zhang,et al.  Secure Data Storage and Recovery in Industrial Blockchain Network Environments , 2020, IEEE Transactions on Industrial Informatics.

[13]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[14]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[15]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[16]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[17]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[18]  Yoshinori Aono,et al.  Privacy-Preserving Logistic Regression with Distributed Data Sources via Homomorphic Encryption , 2016, IEICE Trans. Inf. Syst..

[19]  Xiaodong Lin,et al.  One secure data integrity verification scheme for cloud storage , 2019, Future Gener. Comput. Syst..

[20]  Michael Naehrig,et al.  Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme , 2013, IMACC.

[21]  Yi Mu,et al.  Cloud-Based Outsourcing for Enabling Privacy-Preserving Large-Scale Non-Negative Matrix Factorization , 2019, IEEE Transactions on Services Computing.

[22]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[23]  Sherman S. M. Chow,et al.  Improving privacy and security in multi-authority attribute-based encryption , 2009, CCS.

[24]  Qinghua Li,et al.  Privacy-Preserving Multiparty Learning For Logistic Regression , 2018, SecureComm.

[25]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[26]  Xiaolei Dong,et al.  Security and privacy for storage and computation in cloud computing , 2014, Inf. Sci..

[27]  Rui Zhang,et al.  Security Analysis of a Privacy-Preserving Decentralized Key-Policy Attribute-Based Encryption Scheme , 2013, IEEE Transactions on Parallel and Distributed Systems.

[28]  P. Bennett,et al.  Diabetes mellitus in American (Pima) Indians. , 1971, Lancet.

[29]  Shrisha Rao,et al.  A Mechanism Design Approach to Resource Procurement in Cloud Computing , 2014, IEEE Transactions on Computers.