Stochastic Coordinate Minimization with Progressive Precision for Stochastic Convex Optimization

A framework based on iterative coordinate minimization (CM) is developed for stochastic convex optimization. Given that exact coordinate minimization is impossible due to the unknown stochastic nature of the objective function, the crux of the proposed optimization algorithm is an optimal control of the minimization precision in each iteration. We establish the optimal precision control and the resulting order-optimal regret performance for strongly convex and separably nonsmooth functions. An interesting finding is that the optimal progression of precision across iterations is independent of the low-dimensional CM routine employed, suggesting a general framework for extending low-dimensional optimization routines to high-dimensional problems. The proposed algorithm is amenable to online implementation and inherits the scalability and parallelizability properties of CM for large-scale optimization. Requiring only a sublinear order of message exchanges, it also lends itself well to distributed computing as compared with the alternative approach of coordinate gradient descent.

[1]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[2]  Yurii Nesterov,et al.  Subgradient methods for huge-scale optimization problems , 2013, Mathematical Programming.

[3]  Richard Piper,et al.  An overview of gradient descent optimization algorithms , 2016 .

[4]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[5]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[6]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[7]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[8]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Sattar Vakili,et al.  A Random Walk Approach to First-Order Stochastic Convex Optimization , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[11]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[12]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[13]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[14]  Peter Richtárik,et al.  Distributed Block Coordinate Descent for Minimizing Partially Separable Functions , 2014, 1406.0238.

[15]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[16]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[17]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[18]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[19]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[20]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[21]  Luigi Grippof,et al.  Globally convergent block-coordinate techniques for unconstrained optimization , 1999 .

[22]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[23]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[24]  Ambuj Tewari,et al.  On the Finite Time Convergence of Cyclic Coordinate Descent Methods , 2010, ArXiv.

[25]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[26]  Michael C. Ferris,et al.  Parallel Variable Distribution , 1994, SIAM J. Optim..

[27]  Arindam Banerjee,et al.  Randomized Block Coordinate Descent for Online and Stochastic Optimization , 2014, ArXiv.

[28]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[29]  Suvrit Sra,et al.  Large-scale randomized-coordinate descent methods with non-separable linear constraints , 2014, UAI.

[30]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[31]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[32]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[33]  Quanquan Gu,et al.  Accelerated Stochastic Block Coordinate Descent with Optimal Sampling , 2016, KDD.

[34]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[35]  Kobi Cohen,et al.  Information-Directed Random Walk for Rare Event Detection in Hierarchical Processes , 2016, IEEE Transactions on Information Theory.

[36]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[37]  A. Rangarajan,et al.  Stochastic Coordinate Descent for Nonsmooth Convex Optimization , 2013 .

[38]  Qing Tao,et al.  Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses , 2012, ECML/PKDD.

[39]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[40]  Peter Richtárik,et al.  Stochastic Dual Coordinate Ascent with Adaptive Probabilities , 2015, ICML.

[41]  Jie Liu,et al.  Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.

[42]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[43]  Patrick Thiran,et al.  Coordinate Descent with Bandit Sampling , 2017, NeurIPS.

[44]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[45]  Yiming Wang,et al.  Accelerated Mini-batch Randomized Block Coordinate Descent Method , 2014, NIPS.

[46]  Ming Yan,et al.  Parallel and distributed sparse optimization , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[47]  Sujin Kim,et al.  The stochastic root-finding problem: Overview, solutions, and open questions , 2011, TOMC.

[48]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[49]  H. Robbins A Stochastic Approximation Method , 1951 .

[50]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[51]  Sashank J. Reddi,et al.  Why ADAM Beats SGD for Attention Models , 2019, ArXiv.