D-FW: Communication efficient distributed algorithms for high-dimensional sparse optimization

We propose distributed algorithms for high-dimensional sparse optimization. In many applications, the parameter is sparse but high-dimensional. This is pathological for existing distributed algorithms as the latter require an information exchange stage involving transmission of the full parameter, which may not be sparse during the intermediate steps of optimization. The novelty of this work is to develop communication efficient algorithms using the stochastic Frank-Wolfe (sFW) algorithm, where the gradient computation is inexact but controllable. For star network topology, we propose an algorithm with low communication cost and establishes its convergence. The proposed algorithm is then extended to perform decentralized optimization on general network topology. Numerical experiments are conducted to verify our findings.

[1]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[2]  L. Rosasco,et al.  Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.

[3]  Maria-Florina Balcan,et al.  A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning , 2014, SDM.

[4]  Volkan Cevher,et al.  Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics , 2014, IEEE Signal Processing Magazine.

[5]  M. Fukushima A modified Frank-Wolfe algorithm for solving the traffic assignment problem , 1984 .

[6]  Anna Scaglione,et al.  Distributed Constrained Optimization by Consensus-Based Primal-Dual Perturbation Method , 2013, IEEE Transactions on Automatic Control.

[7]  Eric Moulines,et al.  Convergence Analysis of a Stochastic Projection-free Algorithm , 2015, ArXiv.

[8]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[9]  Angelia Nedic,et al.  A new class of distributed optimization algorithms: application to regression of distributed data , 2012, Optim. Methods Softw..

[10]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[11]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[12]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[13]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[14]  Zaïd Harchaoui,et al.  Lifted coordinate descent for learning with trace-norm regularization , 2012, AISTATS.

[15]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[16]  Francisco Facchinei,et al.  Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems , 2013, IEEE Transactions on Signal Processing.

[17]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[18]  Martin Jaggi,et al.  An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms , 2013, 1312.7864.

[19]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[20]  Daniel Pérez Palomar,et al.  A tutorial on decomposition methods for network utility maximization , 2006, IEEE Journal on Selected Areas in Communications.

[21]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[22]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[23]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[24]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[25]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[26]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[27]  Alexander G. Gray,et al.  Fast Stochastic Frank-Wolfe Algorithms for Nonlinear SVMs , 2010, SDM.

[28]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.