Differentially private Naive Bayes learning over multiple data sources

Abstract For meeting diverse requirements of data analysis, the machine learning classifier has been provided as a tool to evaluate data in many applications. Due to privacy concerns of preventing disclosing sensitive information, data owners often suppress their data for an untrusted trainer to train a classifier. Some existing work proposed privacy-preserving solutions for learning algorithms, which allow a trainer to build a classifier over the data from a single owner. However, they cannot be directly used in the multi-owner setting where each owner is not totally trusted for each other. In this paper, we propose a novel privacy-preserving Naive Bayes learning scheme with multiple data sources. The proposed scheme enables a trainer to train a Naive Bayes classifier over the dataset provided jointly by different data owners, without the help of a trusted curator. The training result can achieve ϵ-differential privacy while the training will not break the privacy of each owner. We implement the prototype of the scheme and conduct corresponding experiment.

[1]  Jin Li,et al.  Secure Deduplication with Efficient and Reliable Convergent Key Management , 2014, IEEE Transactions on Parallel and Distributed Systems.

[2]  Jin Li,et al.  CDPS: A cryptographic data publishing system , 2017, J. Comput. Syst. Sci..

[3]  Ali Miri,et al.  Privacy-preserving back-propagation and extreme learning machine algorithms , 2012, Data Knowl. Eng..

[4]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[5]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[6]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Fatos Xhafa,et al.  L-EncDB: A lightweight framework for privacy-preserving data queries in cloud computing , 2015, Knowl. Based Syst..

[8]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[9]  Yanchun Zhang,et al.  Privacy-preserving naive Bayes classification on distributed data via semi-trusted mixers , 2009, Inf. Syst..

[10]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[11]  Xuan Li,et al.  Cloud-assisted privacy-preserving profile-matching scheme under multiple keys in mobile social network , 2018, Cluster Computing.

[12]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[13]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[14]  Jaideep Vaidya,et al.  Differentially private search log sanitization with optimal output utility , 2011, EDBT '12.

[15]  Jin Li,et al.  A Hybrid Cloud Approach for Secure Authorized Deduplication , 2015, IEEE Transactions on Parallel and Distributed Systems.

[16]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[17]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[18]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[19]  Lu Li,et al.  Privacy-Preserving Naive Bayes Classification , 2015, KSEM.

[20]  Hao Wang,et al.  New directly revocable attribute-based encryption scheme and its application in cloud storage environment , 2016, Cluster Computing.

[21]  Fucai Zhou,et al.  Dynamic Fully Homomorphic encryption-based Merkle Tree for lightweight streaming authenticated data structures , 2018, J. Netw. Comput. Appl..

[22]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[23]  Jin Li,et al.  Secure attribute-based data sharing for resource-limited users in cloud computing , 2018, Comput. Secur..

[24]  Jian Shen,et al.  Cloud-aided lightweight certificateless authentication protocol with anonymity for wireless body area networks , 2018, J. Netw. Comput. Appl..

[25]  Shucheng Yu,et al.  Privacy Preserving Back-Propagation Neural Network Learning Made Practical with Cloud Computing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[26]  Sheng Zhong,et al.  Privacy preserving Back-propagation neural network learning over arbitrarily partitioned data , 2011, Neural Computing and Applications.

[27]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[28]  Basit Shafiq,et al.  Differentially Private Naive Bayes Classification , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[29]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[30]  Jianfeng Ma,et al.  New Algorithms for Secure Outsourcing of Modular Exponentiations , 2014, IEEE Trans. Parallel Distributed Syst..

[31]  Vinod Vaikuntanathan,et al.  Efficient Fully Homomorphic Encryption from (Standard) LWE , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[32]  Siu-Ming Yiu,et al.  Multi-key privacy-preserving deep learning in cloud computing , 2017, Future Gener. Comput. Syst..

[33]  Siu-Ming Yiu,et al.  HybridORAM: Practical oblivious cloud storage with constant bandwidth , 2018, Inf. Sci..

[34]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[35]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[36]  Witawas Srisa-an,et al.  Significant Permission Identification for Machine-Learning-Based Android Malware Detection , 2018, IEEE Transactions on Industrial Informatics.

[37]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[38]  Jin Li,et al.  Identity-Based Encryption with Outsourced Revocation in Cloud Computing , 2015, IEEE Transactions on Computers.

[39]  Jin Li,et al.  Insight of the protection for data security under selective opening attacks , 2017, Inf. Sci..

[40]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[41]  Sebastian Nowozin,et al.  Oblivious Multi-Party Machine Learning on Trusted Processors , 2016, USENIX Security Symposium.

[42]  Sheng Zhong,et al.  Privacy-Preserving Backpropagation Neural Network Learning , 2009, IEEE Transactions on Neural Networks.

[43]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[44]  Ahmad-Reza Sadeghi,et al.  Secure Evaluation of Private Linear Branching Programs with Medical Applications , 2009, ESORICS.

[45]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[46]  Jianfeng Ma,et al.  Verifiable Computation over Large Database with Incremental Updates , 2014, IEEE Transactions on Computers.

[47]  Vinod Vaikuntanathan,et al.  Efficient Fully Homomorphic Encryption from (Standard) LWE , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[48]  Jin Li,et al.  Privacy-preserving outsourced classification in cloud computing , 2017, Cluster Computing.