A projection-free decentralized algorithm for non-convex optimization

This paper considers a decentralized projection free algorithm for non-convex optimization in high dimension. More specifically, we propose a Decentralized Frank-Wolfe (DeFW) algorithm which is suitable when high dimensional optimization constraints are difficult to handle by conventional projection/proximal-based gradient descent methods. We present conditions under which the DeFW algorithm converges to a stationary point and prove that the rate of convergence is as fast as O(l/√T), where T is the iteration number. This paper provides the first convergence guarantee for FrankWolfe methods applied to non-convex decentralized optimization. Utilizing our theoretical findings, we formulate a novel robust matrix completion problem and apply DeFW to give an efficient decentralized solution. Numerical experiments are performed on realistic and synthetic data to support our findings.

[1]  Qing Ling,et al.  Decentralized low-rank matrix completion , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[3]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[4]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[5]  John N. Tsitsiklis,et al.  On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[6]  Yang Yang,et al.  A Parallel Stochastic Approximation Method for Nonconvex Multi-Agent Optimization Problems , 2014, ArXiv.

[7]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[8]  Mingyi Hong,et al.  Decomposing Linearly Constrained Nonconvex Problems by a Proximal Primal Dual Approach: Algorithms, Convergence, and Applications , 2016, ArXiv.

[9]  Eric Moulines,et al.  Decentralized Projection-free Optimization for Convex and Non-convex Problems. , 2016 .

[10]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[11]  Anna Scaglione,et al.  Consensus on State and Time: Decentralized Regression With Asynchronous Sampling , 2015, IEEE Transactions on Signal Processing.

[12]  Christian Jutten,et al.  Fast Sparse Representation Based on Smoothed l0 Norm , 2007, ICA.

[13]  Anna Scaglione,et al.  A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[15]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[16]  Yu. M. Ermol'ev,et al.  A linearization method in limiting extremal problems , 1976 .

[17]  Yang Yang,et al.  A Parallel Decomposition Method for Nonconvex Stochastic Multi-Agent Optimization Problems , 2016, IEEE Transactions on Signal Processing.

[18]  Angelia Nedic,et al.  A new class of distributed optimization algorithms: application to regression of distributed data , 2012, Optim. Methods Softw..

[19]  Soumyadip Ghosh,et al.  Computing Worst-case Input Models in Stochastic Simulation , 2015 .

[20]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[21]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[22]  Ali H. Sayed,et al.  Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior , 2013, IEEE Signal Processing Magazine.

[23]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[24]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[25]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[26]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[27]  Eric Moulines,et al.  D-FW: Communication efficient distributed algorithms for high-dimensional sparse optimization , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[29]  References , 1971 .

[30]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[31]  Volkan Cevher,et al.  Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics , 2014, IEEE Signal Processing Magazine.

[32]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[33]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[34]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[35]  Anna Scaglione,et al.  Convergence and Applications of a Gossip-Based Gauss-Newton Algorithm , 2012, IEEE Transactions on Signal Processing.

[36]  Anna Scaglione,et al.  Distributed Constrained Optimization by Consensus-Based Primal-Dual Perturbation Method , 2013, IEEE Transactions on Automatic Control.