论文信息 - Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data - 字舞流文

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

We proposed a doubly stochastic primal-dual coordinate optimization algorithm for regularized empirical risk minimization that can be formulated as a saddlepoint problem. Different from existing coordinate methods, the proposed method randomly samples both primal and dual coordinates to update solutions, which is a desirable property when applied to data with both a high dimension and a large size. The convergence of our method is established not only in terms of the solution’s distance to optimality but also in terms of the primal-dual objective gap. When applied to the data matrix already factorized as a product of two smaller matrices, we show that the proposed method has a lower overall complexity than other coordinate methods, especially, when data size is large.

Tianbao Yang | Adams Wei Yu | Qihang Lin | Tianbao Yang | Qihang Lin | A. Yu

[1] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[2] Michael W. Mahoney. Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[3] Laurent El Ghaoui,et al. Robust sketching for multiple square-root LASSO problems , 2014, AISTATS.

[4] O. SIAMJ.,et al. PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[5] Le Song,et al. Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[6] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[7] Guanghui Lan,et al. Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.

[8] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.

[9] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[10] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .

[11] Yunmei Chen,et al. Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[12] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[13] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[14] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[15] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[16] Michael W. Mahoney Boyd,et al. Randomized Algorithms for Matrices and Data , 2010 .

[17] Rong Jin,et al. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[18] Atsushi Nitanda,et al. Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[19] Antonin Chambolle,et al. On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[20] Yurii Nesterov,et al. Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[21] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[22] Rong Jin,et al. Recovering the Optimal Solution by Dual Random Projection , 2012, COLT.

[23] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[24] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[25] Peter Richtárik,et al. Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[26] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[27] Xi Chen,et al. Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[28] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[29] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[30] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[31] S. V. N. Vishwanathan,et al. Distributed Stochastic Optimization of the Regularized Risk , 2014, ArXiv.

[32] Renato D. C. Monteiro,et al. Complexity of Variants of Tseng's Modified F-B Splitting and Korpelevich's Methods for Hemivariational Inequalities with Applications to Saddle-point and Convex Optimization Problems , 2011, SIAM J. Optim..

[33] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[34] S. Muthukrishnan,et al. Faster least squares approximation , 2007, Numerische Mathematik.

[35] Guanghui Lan,et al. Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[36] Yiming Wang,et al. Accelerated Mini-batch Randomized Block Coordinate Descent Method , 2014, NIPS.

[37] Peter Richtárik,et al. Semi-stochastic coordinate descent , 2014, Optim. Methods Softw..

[38] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[39] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[40] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.