Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization

Stochastic compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of \textit{Stochastic Recursive Gradient Descent} to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: $\mathcal{O}((n+m)^{1/2} \varepsilon^{-2})$ in the finite-sum case and $\mathcal{O}(\varepsilon^{-3})$ in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. Our experiments validate the theoretical performance of our algorithm.

[1]  Marten van Dijk,et al.  Finite-sum smooth optimization with SARAH , 2019, Computational Optimization and Applications.

[2]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[3]  Liu Liu,et al.  Variance Reduced Methods for Non-Convex Composition Optimization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  A. Ruszczynski,et al.  Statistical estimation of composite risk functionals and risk optimization problems , 2015, 1504.02658.

[5]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[8]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[9]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[10]  Lin Xiao,et al.  MultiLevel Composite Stochastic Optimization via Nested Variance Reduction , 2019, SIAM J. Optim..

[11]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[12]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[13]  Mengdi Wang,et al.  Multilevel Stochastic Gradient Methods for Nested Composition Optimization , 2018, SIAM J. Optim..

[14]  Artin,et al.  SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[15]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[16]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[17]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[18]  Yingbin Liang,et al.  SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.

[19]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[20]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[21]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[22]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[23]  Heng Huang,et al.  Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization , 2017, AAAI.

[24]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[25]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[26]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[27]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[28]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[29]  Marten van Dijk,et al.  Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH , 2019, ArXiv.

[30]  Huan Li,et al.  Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[31]  Michael I. Jordan,et al.  Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient , 2020, 2020 American Control Conference (ACC).

[32]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[33]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[36]  Nathan Srebro,et al.  Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.