Distributed Training for Multi-Layer Neural Networks by Consensus

Over the past decade, there has been a growing interest in large-scale and privacy-concerned machine learning, especially in the situation where the data cannot be shared due to privacy protection or cannot be centralized due to computational limitations. Parallel computation has been proposed to circumvent these limitations, usually based on the master–slave and decentralized topologies, and the comparison study shows that a decentralized graph could avoid the possible communication jam on the central agent but incur extra communication cost. In this brief, a consensus algorithm is designed to allow all agents over the decentralized graph to converge to each other, and the distributed neural networks with enough consensus steps could have nearly the same performance as the centralized training model. Through the analysis of convergence, it is proved that all agents over an undirected graph could converge to the same optimal model even with only a single consensus step, and this can significantly reduce the communication cost. Simulation studies demonstrate that the proposed distributed training algorithm for multi-layer neural networks without data exchange could exhibit comparable or even better performance than the centralized training model.

[1]  H. Vincent Poor,et al.  Distributed learning in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[2]  Zhengtao Ding,et al.  Consensus Control of a Class of Lipschitz Nonlinear Systems With Input Delay , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Ling Shao,et al.  Learning Deep and Wide: A Spectral Method for Learning Deep Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Pangfeng Liu,et al.  Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[5]  Michel Toulouse,et al.  A Consensus Based Network Intrusion Detection System , 2015, 2015 5th International Conference on IT Convergence and Security (ICITCS).

[6]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.

[7]  Yi Zhou,et al.  Asynchronous Decentralized Accelerated Stochastic Gradient Descent , 2018, IEEE Journal on Selected Areas in Information Theory.

[8]  Hairong Qi,et al.  Friendbook: A Semantic-Based Friend Recommendation System for Social Networks , 2015, IEEE Transactions on Mobile Computing.

[9]  Frank L. Lewis,et al.  Cooperative Control of Multi-Agent Systems: Optimal and Adaptive Design Approaches , 2013 .

[10]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[11]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[12]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[13]  Shengyuan Xu,et al.  Stability Analysis of Distributed Delay Neural Networks Based on Relaxed Lyapunov–Krasovskii Functionals , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[15]  Song Zheng,et al.  Impulsive consensus in directed networks of identical nonlinear oscillators with switching topologies , 2012 .

[16]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[17]  Dianhui Wang,et al.  Distributed learning for Random Vector Functional-Link networks , 2015, Inf. Sci..

[18]  Xiaojing Ye,et al.  Consensus optimization with delayed and stochastic gradients on decentralized networks , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[19]  R.W. Beard,et al.  Multi-agent Kalman consensus with relative uncertainty , 2005, Proceedings of the 2005, American Control Conference, 2005..

[20]  Dianhui Wang,et al.  A decentralized training algorithm for Echo State Networks in distributed big data applications , 2016, Neural Networks.

[21]  Zhisheng Duan,et al.  Cooperative Control of Multi-Agent Systems: A Consensus Region Approach , 2014 .

[22]  Milos S. Stankovic,et al.  A Distributed Support Vector Machine Learning Over Wireless Sensor Networks , 2015, IEEE Transactions on Cybernetics.

[23]  John Yearwood,et al.  Heterogeneous Cooperative Co-Evolution Memetic Differential Evolution Algorithm for Big Data Optimization Problems , 2017, IEEE Transactions on Evolutionary Computation.

[24]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[25]  Ran He,et al.  Two-Stage Nonnegative Sparse Representation for Large-Scale Face Recognition , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Enrique F. Castillo,et al.  Distributed One-Class Support Vector Machine , 2015, Int. J. Neural Syst..

[27]  Abdorasoul Ghasemi,et al.  A Distributed Learning Automata Scheme for Spectrum Management in Self-Organized Cognitive Radio Network , 2017, IEEE Transactions on Mobile Computing.

[28]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[29]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[30]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[31]  Yong Zhang,et al.  A Digital Liquid State Machine With Biologically Inspired Learning and Its Application to Speech Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[33]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[34]  Hwee Pink Tan,et al.  Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications , 2014, IEEE Communications Surveys & Tutorials.

[35]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[36]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[37]  Martin Hasler,et al.  Distributed machine learning in networks by consensus , 2014, Neurocomputing.

[38]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[39]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).