On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization

We consider a wide spectrum of regularized stochastic minimization problems, where the regularization term is composite with a linear function. Examples of this formulation include graph-guided regularized minimization, generalized Lasso and a class of l1 regularized problems. The computational challenge is that the closed-form solution of the proximal mapping associated with the regularization term is not available due to the imposed linear composition. Fortunately, the structure of the regularization term allows us to reformulate it as a new convex-concave saddle point problem which can be solved using the Primal-Dual Hybrid Gradient (PDHG) approach. However, this approach may be inefficient in realistic applications as computing the full gradient of the expected objective function could be very expensive when the number of input data samples is considerably large. To address this issue, we propose a Stochastic PDHG (SPDHG) algorithm with either uniformly or non-uniformly averaged iterates. Through uniformly averaged iterates, the SPDHG algorithm converges in expectation with O(1/√t) rate for general convex objectives and O(log(t)/t) rate for strongly convex objectives, respectively. While with non-uniformly averaged iterates, the SPDHG algorithm is expected to converge with O(1/t) rate for strongly convex objectives. Numerical experiments on different genres of datasets demonstrate that our proposed algorithm outperforms other competing algorithms.

[1]  Shiqian Ma,et al.  Sparse Inverse Covariance Selection via Alternating Linearization Methods , 2010, NIPS.

[2]  Zhanxing Zhu,et al.  Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems , 2015, ECML/PKDD.

[3]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[4]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[5]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[6]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[7]  Valeria Ruggiero,et al.  On the Convergence of Primal–Dual Hybrid Gradient Algorithms for Total Variation Image Restoration , 2012, Journal of Mathematical Imaging and Vision.

[8]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[9]  Taiji Suzuki,et al.  Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers , 2014, ICML.

[10]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[11]  Tong Zhang,et al.  Adaptive Stochastic Alternating Direction Method of Multipliers , 2015, ICML.

[12]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[13]  Tony F. Chan,et al.  A General Framework for a Class of First Order Primal-Dual Algorithms for Convex Optimization in Imaging Science , 2010, SIAM J. Imaging Sci..

[14]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[15]  Xiang Gao,et al.  On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[16]  Xiaoming Yuan,et al.  Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems , 2013, 1305.0546.

[17]  s-taiji Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[18]  Arindam Banerjee,et al.  Online Alternating Direction Method , 2012, ICML.

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[21]  Mingqiang Zhu,et al.  An Efficient Primal-Dual Hybrid Gradient Algorithm For Total Variation Image Restoration , 2008 .

[22]  Stanley Osher,et al.  A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration , 2010, J. Sci. Comput..

[23]  Suvrit Sra,et al.  Towards an optimal stochastic alternating direction method of multipliers , 2014, ICML.

[24]  Leon Wenliang Zhong,et al.  Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[25]  Bingsheng He,et al.  Convergence Analysis of Primal-Dual Algorithms for a Saddle-Point Problem: From Contraction Perspective , 2012, SIAM J. Imaging Sci..