Distributed Machine Learning Oriented Data Integrity Verification Scheme in Cloud Computing Environment

Distributed Machine Learning (DML) is one of the core technologies for Artificial Intelligence (AI). However, in the existing distributed machine learning framework, the data integrity is not taken into account. If network attackers forge the data, modify the data, or destroy the data, the training model in the distributed machine learning system will be greatly affected, and the training results are led to be wrong. Therefore, it is crucial to guarantee the data integrity in the DML. In this paper, we propose a distributed machine learning oriented data integrity verification scheme (DML-DIV) to ensure the integrity of training data. Firstly, we adopt the idea of Provable Data Possession (PDP) sampling auditing algorithm to achieve data integrity verification so that our DML-DIV scheme can resist forgery attacks and tampering attacks. Secondly, we generate a random number, namely blinding factor, and apply the discrete logarithm problem (DLP) to construct proof and ensure privacy protection in the TPA verification process. Thirdly, we employ identity-based cryptography and two-step key generation technology to generate data owner’s public/private key pair so that our DML-DIV scheme can solve the key escrow problem and reduce the cost of managing the certificates. Finally, formal theoretical analysis and experimental results show the security and efficiency of our DML-DIV scheme.

[1]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[2]  Jiguo Li,et al.  Certificateless Public Integrity Checking of Group Shared Data on Cloud Storage , 2018, IEEE Transactions on Services Computing.

[3]  Albert Y. Zomaya,et al.  Auditing Big Data Storage in Cloud Computing Using Divide and Conquer Tables , 2018, IEEE Transactions on Parallel and Distributed Systems.

[4]  Bharat Bhargava,et al.  Identity-Preserving Public Integrity Checking with Dynamic Groups for Cloud Storage , 2018 .

[5]  Hongliang Zhu,et al.  A Secure and Efficient Data Integrity Verification Scheme for Cloud-IoT Based on Short Signature , 2019, IEEE Access.

[6]  Jiguo Li,et al.  Remote Data Checking With a Designated Verifier in Cloud Storage , 2020, IEEE Systems Journal.

[7]  Shouhuai Xu,et al.  Efficient query integrity for outsourced dynamic databases , 2012, CCSW '12.

[8]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[9]  Hao Yan,et al.  A Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage , 2017, IEEE Transactions on Information Forensics and Security.

[10]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[11]  Jianhong Zhang,et al.  Efficient ID-based public auditing for the outsourced data in cloud storage , 2016, Inf. Sci..

[12]  Stephen S. Yau,et al.  Efficient audit service outsourcing for data integrity in clouds , 2012, J. Syst. Softw..

[13]  Huaqun Wang,et al.  Identity-Based Distributed Provable Data Possession in Multicloud Storage , 2015, IEEE Transactions on Services Computing.

[14]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[15]  Cong Wang,et al.  Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  Jiguo Li,et al.  Efficient Identity-Based Provable Multi-Copy Data Possession in Multi-Cloud Storage , 2019, IEEE Transactions on Cloud Computing.

[17]  Zoe L. Jiang,et al.  Privacy-Preserving Public Auditing for Secure Cloud Storage , 2013, IEEE Transactions on Computers.

[18]  Aaron Q. Li,et al.  Parameter Server for Distributed Machine Learning , 2013 .

[19]  Hugo Krawczyk,et al.  HMQV: A High-Performance Secure Diffie-Hellman Protocol , 2005, CRYPTO.

[20]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[21]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[22]  Yong Yu,et al.  Identity-Based Remote Data Integrity Checking With Perfect Data Privacy Preserving for Cloud Storage , 2017, IEEE Transactions on Information Forensics and Security.

[23]  Jiankun Hu,et al.  Enabling Identity-Based Integrity Auditing and Data Sharing With Sensitive Information Hiding for Secure Cloud Storage , 2019, IEEE Transactions on Information Forensics and Security.

[24]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[25]  Huaqun Wang,et al.  Identity-Based Proxy-Oriented Data Uploading and Remote Data Integrity Checking in Public Cloud , 2016, IEEE Transactions on Information Forensics and Security.

[26]  Xuefeng Zheng,et al.  User stateless privacy-preserving TPA auditing scheme for cloud storage , 2019, J. Netw. Comput. Appl..

[27]  Robert H. Deng,et al.  Variations of Diffie-Hellman Problem , 2003, ICICS.

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.