Fast cooperative distributed learning

We consider distributed optimization where N agents in a network minimize the sum equation of their individual convex costs. To solve the described problem, existing literature proposes distributed gradient-like algorithms that are attractive due to computationally simple iterations k, but have a drawback of slow convergence (in k) to a solution. We propose a distributed gradient-like algorithm, that we build from the (centralized) Nesterov gradient method. For the convex fi's with Lipschitz continuous and bounded gradients, we show that our method converges at rate O(log k/k). The achieved rate significantly improves over the convergence rate of existing distributed gradient-like methods, while the proposed algorithm maintains the same communication cost per k and a very similar computational cost per k. We further show that the rate O(log k/k) still holds if the bounded gradients assumption is replaced by a certain linear growth assumption. We illustrate the gains obtained by our method on two simulation examples: acoustic source localization and learning a linear classifier based on l2-regularized logistic loss.

[1]  José M. F. Moura,et al.  Distributed Nesterov-like gradient algorithms , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[2]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[3]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[4]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[5]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[6]  David A. Wismer,et al.  Optimization methods for large-scale systems ... with applications , 1971 .

[7]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[8]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[9]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .