An Exact Quantized Decentralized Gradient Descent Algorithm

We consider the problem of decentralized consensus optimization, where the sum of <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> smooth and strongly convex functions are minimized over <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> distributed agents that form a connected network. In particular, we consider the case that the communicated local decision variables among nodes are quantized in order to alleviate the communication bottleneck in distributed optimization. We propose the Quantized Decentralized Gradient Descent (QDGD) algorithm, in which nodes update their local decision variables by combining the quantized information received from their neighbors with their local information. We prove that under standard strong convexity and smoothness assumptions for the objective function, QDGD achieves a vanishing mean solution error under customary conditions for quantizers. To the best of our knowledge, this is the first algorithm that achieves vanishing consensus error in the presence of quantization noise. Moreover, we provide simulation results that show tight agreement between our derived theoretical convergence rate and the numerical results.

[1]  Thinh T. Doan,et al.  Accelerating the Convergence Rates of Distributed Subgradient Methods with Adaptive Quantization , 2018, 1810.13245.

[2]  Aryan Mokhtari,et al.  Decentralized Quasi-Newton Methods , 2016, IEEE Transactions on Signal Processing.

[3]  T. Başar,et al.  Quantization and coding for decentralized LTI systems , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[4]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[7]  Wei Ren,et al.  Information consensus in multivehicle cooperative control , 2007, IEEE Control Systems.

[8]  John N. Tsitsiklis,et al.  Distributed subgradient methods and quantization effects , 2008, 2008 47th IEEE Conference on Decision and Control.

[9]  Gesualdo Scutari,et al.  Finite Rate Quantized Distributed optimization with Geometric Convergence , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[10]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[11]  Amir Salman Avestimehr,et al.  Coded Computing for Distributed Graph Analytics , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[12]  Aryan Mokhtari,et al.  Quantized Decentralized Consensus Optimization , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[13]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[14]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[15]  Alejandro Ribeiro,et al.  Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking , 2010, IEEE Transactions on Signal Processing.

[16]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[17]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[18]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[19]  Zhengyuan Zhu,et al.  Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[20]  Aryan Mokhtari,et al.  Network Newton Distributed Optimization Methods , 2017, IEEE Transactions on Signal Processing.

[21]  Robert D. Nowak,et al.  Quantized incremental algorithms for distributed optimization , 2005, IEEE Journal on Selected Areas in Communications.

[22]  John N. Tsitsiklis,et al.  On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[23]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[24]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[25]  Sonia Martínez,et al.  Quantized distributed load balancing with capacity constraints , 2014, 53rd IEEE Conference on Decision and Control.

[26]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[27]  Michael Rabbat,et al.  Distributed Average Consensus using Probabilistic Quantization , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[28]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[29]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[30]  Michael G. Rabbat,et al.  Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[31]  R. Srikant,et al.  Quantized Consensus , 2006, 2006 IEEE International Symposium on Information Theory.

[32]  Ji Liu,et al.  Design and analysis of distributed averaging with quantized communication , 2014, 53rd IEEE Conference on Decision and Control.

[33]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[34]  Christina Fragouli,et al.  Communication vs distributed computation: An alternative trade-off curve , 2017, 2017 IEEE Information Theory Workshop (ITW).

[35]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[36]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[37]  Aryan Mokhtari,et al.  DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers , 2016, IEEE Transactions on Signal Processing.

[38]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.