Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization

We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning. We propose the first ADMM-based algorithm named com-SVR-ADMM, and show that com-SVR-ADMM converges linearly for strongly convex and Lipschitz smooth objectives, and has a convergence rate of $O( \log S/S)$, which improves upon the $O(S^{-4/9})$ rate in \cite{wang2016accelerating} when the objective is convex and Lipschitz smooth. Moreover, com-SVR-ADMM possesses a rate of $O(1/\sqrt{S})$ when the objective is convex but without Lipschitz smoothness. We also conduct experiments and show that it outperforms existing algorithms.

[1]  Chao Qian,et al.  Accelerated Stochastic ADMM for Empirical Risk Minimization , 2016 .

[2]  Jiashi Feng,et al.  Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization , 2016, J. Optim. Theory Appl..

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[5]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  James T. Kwok,et al.  Stochastic Variance-Reduced ADMM , 2016 .

[8]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[9]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[10]  Stanley Osher,et al.  A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration , 2010, J. Sci. Comput..

[11]  Arindam Banerjee,et al.  Online Alternating Direction Method (longer version) , 2013, ArXiv.

[12]  Le Song,et al.  Learning from Conditional Distributions via Dual Kernel Embeddings , 2016, ArXiv.

[13]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[14]  Aravaipa Canyon Basin,et al.  Volume 3 , 2012, Journal of Diabetes Investigation.

[15]  Alexandre M. Baptista,et al.  A Comparison of VaR and CVaR Constraints on Portfolio Selection with the Mean-Variance Model , 2004, Manag. Sci..

[16]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Zeyuan Allen Zhu,et al.  Katyusha: Accelerated Variance Reduction for Faster SGD , 2016, ArXiv.

[19]  Le Song,et al.  Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.

[20]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Cuong V Nguyen,et al.  Accelerated Stochastic Mirror Descent Algorithms For Composite Non-strongly Convex Optimization , 2016 .

[23]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[24]  Alexander Shapiro,et al.  Lectures on Stochastic Programming - Modeling and Theory, Second Edition , 2014, MOS-SIAM Series on Optimization.

[25]  James T. Kwok,et al.  Fast-and-Light Stochastic ADMM , 2016, IJCAI.

[26]  s-taiji Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[27]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[28]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[29]  Arindam Banerjee,et al.  Online Alternating Direction Method , 2012, ICML.