Coordinate descent with arbitrary sampling II: expected separable overapproximation

The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depend on the notion of expected separable overapproximation (ESO). This refers to an inequality involving the objective function and the sampling, capturing in a compact way certain smoothness properties of the function in a random subspace spanned by the sampled coordinates. ESO inequalities were previously established for special classes of samplings only, almost invariably for uniform samplings. In this paper we develop a systematic technique for deriving these inequalities for a large class of functions and for arbitrary samplings. We demonstrate that one can recover existing ESO results using our general approach, which is based on the study of eigenvalues associated with samplings and the data describing the function.

[1]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[2]  Peter Richtárik,et al.  Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2014, ArXiv.

[3]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[4]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[5]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[6]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[7]  Aaron Sidford,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[8]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[9]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[10]  Zhaosong Lu,et al.  An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2014, 1407.1296.

[11]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[12]  Peter Richtárik,et al.  Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[14]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[15]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[16]  Peter Richtárik,et al.  Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function , 2011 .

[17]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[18]  Peter Richtárik,et al.  Separable approximations and decomposition methods for the augmented Lagrangian , 2013, Optim. Methods Softw..

[19]  Ion Necoara,et al.  A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints , 2013, Comput. Optim. Appl..

[20]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[21]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[22]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[23]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[24]  Peter Richtárik,et al.  Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2015, NIPS.

[25]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[26]  Peter Richtárik,et al.  Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..

[27]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[28]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling , 2014, ArXiv.

[29]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[30]  Peter Richt,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2016 .

[31]  Peter Richtárik,et al.  Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design , 2011, OR.

[32]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[33]  Pradeep Ravikumar,et al.  Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.

[34]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[35]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[36]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[37]  F. Cole To the Best of Our Knowledge , 1979 .

[38]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[39]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..