Parallel and distributed training of neural networks via successive convex approximation

The aim of this paper is to develop a theoretical framework for training neural network (NN) models, when data is distributed over a set of agents that are connected to each other through a sparse network topology. The framework builds on a distributed convexification technique, while leveraging dynamic consensus to propagate the information over the network. It can be customized to work with different loss and regularization functions, typically used when training NN models, while guaranteeing provable convergence to a stationary solution under mild assumptions. Interestingly, it naturally leads to distributed architectures where agents solve local optimization problems exploiting parallel multi-core processors. Numerical results corroborate our theoretical findings, and assess the performance for parallel and distributed training of neural networks.

[1]  Sergio Barbarossa,et al.  Communicating While Computing: Distributed mobile cloud computing over 5G heterogeneous networks , 2014, IEEE Signal Processing Magazine.

[2]  Qing Zhao,et al.  Distributed Learning in Wireless Sensor Networks , 2007 .

[3]  Ali H. Sayed,et al.  Sparse Distributed Learning Based on Diffusion Adaptation , 2012, IEEE Transactions on Signal Processing.

[4]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[5]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[6]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[7]  Sonia Martínez,et al.  Discrete-time dynamic average consensus , 2010, Autom..

[8]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[9]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[10]  Dianhui Wang,et al.  Distributed learning for Random Vector Functional-Link networks , 2015, Inf. Sci..

[11]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[12]  Georgios B. Giannakis,et al.  Consensus-based distributed linear support vector machines , 2010, IPSN '10.

[13]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[14]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.