Coordinate Friendly Structures, Algorithms and Applications

This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.

[1]  S. Osher,et al.  Coordinate descent optimization for l 1 minimization with application to compressed sensing; a greedy algorithm , 2009 .

[2]  A. A. Potapenko,et al.  Method of Successive Approximations , 1964, Encyclopedia of Evolutionary Psychological Science.

[3]  Clifford Hildreth,et al.  A quadratic programming procedure , 1957 .

[4]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[5]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[6]  Shih-Ping Han,et al.  A successive projection method , 1988, Math. Program..

[7]  Émilie Chouzenoux,et al.  A random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  D. E. Baz,et al.  Asynchronous iterations with flexible communication: contracting operators , 2005 .

[9]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[10]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[11]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[12]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[13]  Jacques M. Bahi,et al.  Asynchronous multisplitting methods for nonlinear fixed point problems , 1997, Numerical Algorithms.

[14]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[15]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[16]  J. Strikwerda A probabilistic analysis of asynchronous iteration , 2002 .

[17]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[18]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[19]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[20]  Paul Tseng,et al.  On the Rate of Convergence of a Partially Asynchronous Gradient Projection Algorithm , 1991, SIAM J. Optim..

[21]  J. Warga Minimizing Certain Convex Functions , 1963 .

[22]  J. Neumann On Rings of Operators. Reduction Theory , 1949 .

[23]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[24]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[25]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[26]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[27]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[28]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[29]  L. McLlNDEN,et al.  AN EXTENSION OF FENCHEL’S DUALITY THEOREM TO SADDLE FUNCTIONS AND DUAL MINIMAX PROBLEMS , 2012 .

[30]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[31]  Damek Davis,et al.  Convergence Rate Analysis of Primal-Dual Splitting Schemes , 2014, SIAM J. Optim..

[32]  J. Pesquet,et al.  A Class of Randomized Primal-Dual Algorithms for Distributed Optimization , 2014, 1406.6404.

[33]  Paul Tseng,et al.  Dual coordinate ascent methods for non-strictly convex minimization , 1993, Math. Program..

[34]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[35]  Heinz H. Bauschke,et al.  On the convergence of von Neumann's alternating projection algorithm for two sets , 1993 .

[36]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[37]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[38]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[39]  Yangyang Xu,et al.  Alternating proximal gradient method for sparse nonnegative Tucker decomposition , 2013, Mathematical Programming Computation.

[40]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[41]  Bang Công Vu,et al.  A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[42]  P. Tseng Applications of splitting algorithm to decomposition in convex programming and variational inequalities , 1991 .

[43]  Didier El Baz,et al.  Flexible Communication for Parallel Asynchronous Methods with Application to a Nonlinear Optimization Problem , 1997, PARCO.

[44]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[45]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[46]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[47]  Damek Davis,et al.  A Three-Operator Splitting Scheme and its Optimization Applications , 2015, 1504.01032.

[48]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[49]  Patrick L. Combettes,et al.  A forward-backward view of some primal-dual optimization methods in image recovery , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[50]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[51]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[52]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[53]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[54]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[55]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[56]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[57]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[58]  R. Siddon Fast calculation of the exact radiological path for a three-dimensional CT array. , 1985, Medical physics.

[59]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[60]  Pradeep Ravikumar,et al.  Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.

[61]  Ming Yan,et al.  ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..

[62]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[63]  Lieven Vandenberghe,et al.  Primal-Dual Decomposition by Operator Splitting and Applications to Image Deblurring , 2014, SIAM J. Imaging Sci..

[64]  R. Sargent,et al.  On the convergence of sequential minimization algorithms , 1973 .

[65]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[66]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[67]  Daniel B. Szyld,et al.  Asynchronous Iterations , 2011, Encyclopedia of Parallel Computing.

[68]  Damek Davis,et al.  An O(nlog(n)) Algorithm for Projecting Onto the Ordered Weighted ℓ1 Norm Ball , 2015, ArXiv.

[69]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[70]  Inderjit S. Dhillon,et al.  PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent , 2015, ICML.

[71]  Ming Yan,et al.  Parallel and distributed sparse optimization , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[72]  Damek Davis An $O(n\log(n))$ Algorithm for Projecting Onto the Ordered Weighted $\ell_1$ Norm Ball. , 2015 .

[73]  L. Briceño-Arias Forward-Douglas–Rachford splitting and forward-partial inverse method for solving monotone inclusions , 2012, 1212.5942.

[74]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[75]  Norman Zadeh Note---A Note on the Cyclic Coordinate Ascent Method , 1970 .

[76]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[77]  P. L. Combettes,et al.  Monotone Operator Methods for Nash Equilibria in Non-potential Games , 2011, 1106.0144.

[78]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[79]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[80]  Paul Tseng,et al.  A Modified Forward-backward Splitting Method for Maximal Monotone Mappings 1 , 1998 .

[81]  Patrick L. Combettes,et al.  Stochastic Quasi-Fejér Block-Coordinate Fixed Point Iterations with Random Sweeping , 2014 .

[82]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.