Ordered Gradient Approach for Communication-Efficient Distributed Learning

The topic of training machine learning models by employing multiple gradient-computing workers is attracting great interest recently. Communication efficiency in such distributed learning settings is an important consideration, especially for the case where the needed communications are expensive in terms of power usage. We develop a new approach which is efficient in terms of communication transmissions. In this scheme, only the most informative worker results are transmitted to reduce the total number of transmissions. Our ordered gradient approach provably achieves the same order of convergence rate as gradient descent for nonconvex smooth loss functions while gradient descent always requires more communications. Experiments show significant communication savings compared to the best existing approaches in some cases.

[1]  Maojiao Ye,et al.  Distributed Time-Varying Quadratic Optimization for Multiple Agents Under Undirected Graphs , 2017, IEEE Transactions on Automatic Control.

[2]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[3]  Rick S. Blum Ordering for Estimation and Optimization in Energy Efficient Sensor Networks , 2011, IEEE Transactions on Signal Processing.

[4]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[5]  Georgios B. Giannakis,et al.  LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.

[6]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[7]  Damek Davis,et al.  Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[8]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[9]  Rick S. Blum,et al.  Energy Efficient Signal Detection in Sensor Networks Using Ordered Transmissions , 2008, IEEE Transactions on Signal Processing.

[10]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[11]  Rick S. Blum,et al.  Ordered Transmission for Efficient Wireless Autonomy , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[12]  Rick S. Blum,et al.  Testing the Structure of a Gaussian Graphical Model With Reduced Transmissions in a Distributed Setting , 2019, IEEE Transactions on Signal Processing.