Accelerated dual-averaging primal–dual method for composite convex minimization

ABSTRACT Dual averaging-type methods are widely used in industrial machine learning applications due to their ability to promoting solution structure (e.g. sparsity) efficiently. In this paper, we propose a novel accelerated dual-averaging primal–dual algorithm for minimizing a composite convex function. We also derive a stochastic version of the proposed method that solves empirical risk minimization, and its advantages on handling sparse data are demonstrated both theoretically and empirically.

[1]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[2]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[4]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[5]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[6]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[7]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[8]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[9]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[10]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[11]  Ambuj Tewari,et al.  Applications of strong convexity--strong smoothness duality to learning with matrices , 2009, ArXiv.

[12]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[15]  Jingkuan Song,et al.  Learning in high-dimensional multimedia data: the state of the art , 2015, Multimedia Systems.

[16]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[17]  Shiqian Ma,et al.  Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[18]  Taiji Suzuki,et al.  Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization , 2017, NIPS.

[19]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[20]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[21]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[22]  Marius Kloft,et al.  Huber-Norm Regularization for Linear Prediction Models , 2016, ECML/PKDD.

[23]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[24]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[25]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[26]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[27]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.