Efficient Privacy-Preserving Machine Learning in Hierarchical Distributed System

With the dramatic growth of data in both amount and scale, distributed machine learning has become an important tool for the massive data to finish the tasks as prediction, classification, etc. However, due to the practical physical constraints and the potential privacy leakage of data, it is infeasible to aggregate raw data from all data owners for the learning purpose. To tackle this problem, the distributed privacy-preserving learning approaches are introduced to learn over all distributed data without exposing the real information. However, existing approaches have limits on the complicated distributed system. On the one hand, traditional privacy-preserving learning approaches rely on heavy cryptographic primitives on training data, in which the learning speed is dramatically slowed down due to the computation overheads. On the other hand, the complicated system architecture becomes a barrier in the practical distributed system. In this paper, we propose an efficient privacy-preserving machine learning scheme for hierarchical distributed systems. We modify and improve the collaborative learning algorithm. The proposed scheme not only reduces the overhead for the learning process but also provides the comprehensive protection for each layer of the hierarchical distributed system. In addition, based on the analysis of the collaborative convergency in different learning groups, we also propose an asynchronous strategy to further improve the learning efficiency of hierarchical distributed system. At the last, extensive experiments on real-world data are implemented to evaluate the privacy, efficacy, and efficiency of our proposed schemes.

[1]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[2]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[3]  Mauro Barni,et al.  Oblivious Neural Network Computing via Homomorphic Encryption , 2007, EURASIP J. Inf. Secur..

[4]  Antonio Peregrín,et al.  Efficient Distributed Genetic Algorithm for Rule extraction , 2011, Appl. Soft Comput..

[5]  William Stafford Noble,et al.  Support vector machine , 2013 .

[6]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[7]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[8]  Olvi L. Mangasarian,et al.  Privacy-Preserving Classification of Horizontally Partitioned Data via Random Kernels , 2008, DMIN.

[9]  Gonzalo Mateos,et al.  Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.

[10]  Josh Benaloh,et al.  Secret Sharing Homomorphisms: Keeping Shares of A Secret Sharing , 1986, CRYPTO.

[11]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[12]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[13]  Shan Suthaharan,et al.  Support Vector Machine , 2016 .

[14]  Yuguang Fang,et al.  A Secure Collaborative Machine Learning Framework Based on Data Locality , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[15]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[16]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[17]  Yongmei Lei,et al.  An Asynchronous Distributed ADMM Algorithm and Efficient Communication Model , 2016, 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[18]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[19]  Olvi L. Mangasarian Privacy-preserving linear programming , 2011, Optim. Lett..

[20]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[21]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[22]  Bertha Guijarro-Berdiñas,et al.  A survey of methods for distributed machine learning , 2012, Progress in Artificial Intelligence.

[23]  Yuguang Fang,et al.  Privacy-Preserving Machine Learning Algorithms for Big Data Systems , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[24]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[25]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[26]  Grigorios Tsoumakas,et al.  Effective Stacking of Distributed Classifiers , 2002, ECAI.

[27]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[30]  Konstantinos G. Margaritis,et al.  A distributed asynchronous and privacy preserving neural network ensemble selection approach for peer-to-peer data mining , 2012, BCI '12.

[31]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[32]  Thomas A. Baran,et al.  Asynchronous Algorithms for Solving Linear Programs , 2015 .

[33]  Noah A. Smith,et al.  Distributed Asynchronous Online Learning for Natural Language Processing , 2010, CoNLL.

[34]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.