Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.

[1]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[2]  Ambuj Tewari,et al.  Feature Clustering for Accelerating Parallel Coordinate Descent , 2012, NIPS.

[3]  Dragos N. Clipici,et al.  Parallel coordinate descent methods for composite minimization: convergence analysis and error bounds , 2013, 1312.5302.

[4]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[5]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[6]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[7]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[8]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[9]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[10]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[11]  Peter Richtárik,et al.  Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design , 2011, OR.

[12]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[13]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[14]  Julien Mairal Stochastic Majorization-Minimization Optimization with First-Order Surrogate Functions , 2013, ArXiv.

[15]  Peter Richtárik,et al.  Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[16]  Ming Yan,et al.  Parallel and distributed sparse optimization , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[17]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[18]  Dragos N. Clipici,et al.  Distributed coordinate descent methods for composite minimization , 2013 .

[19]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[20]  F. Facchinei,et al.  D C ] 11 N ov 2 01 3 FLEXIBLE PARALLEL ALGORITHMS FOR BIG DATA OPTIMIZATION , 2013 .

[21]  Ambuj Tewari,et al.  Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems , 2012, ICML.

[22]  Zhi-Quan Luo,et al.  A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks , 2013, Mathematical Programming.

[23]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[24]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[27]  Ion Necoara,et al.  A random coordinate descent algorithm for large-scale sparse nonconvex optimization , 2013, 2013 European Control Conference (ECC).

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  Ion Necoara,et al.  Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC , 2013, 1302.3092.

[30]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[31]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[32]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..