Optimal Algorithms for Convex Nested Stochastic Composite Optimization

Recently, convex nested stochastic composite optimization (NSCO) has received considerable attention for its application in reinforcement learning and risk-averse optimization. However, In the current literature, there exists a significant gap in the iteration complexities between these NSCO problems and other simpler stochastic composite optimization problems (e.g., sum of smooth and nonsmooth functions) without the nested structure. %there is a gap in the complexities for minimizing stochastic functions and nested stochastic functions. %and these algorithms often carry a too strong assumption on the smoothness of outer layer functions. In this paper, we close the gap by reformulating a class of convex NSCO problems as "$\min\max\ldots \max$" saddle point problems under mild assumptions and proposing two primal-dual type algorithms with the optimal $\mathcal{O}\{1/\epsilon^2\}$ (resp., $\mathcal{O}\{1/\epsilon\}$) complexity for solving nested (resp., strongly) convex problems. More specifically, for the often-considered two-layer smooth-nonsmooth problem, we introduce a simple vanilla stochastic sequential dual (SSD) algorithm which can be implemented purely in the primal form. For the multi-layer problem, we propose a general stochastic sequential dual framework. The framework consists of modular dual updates for different types of functions (smooth, smoothable, and non-smooth, etc.), so that it can handle a more general composition of layer functions. Moreover, we present modular convergence proofs to show that the complexity of the general SSD is optimal with respect to nearly all the problem parameters.

[1]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[2]  Saeed Ghadimi,et al.  A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..

[3]  Guanghui Lan Ecient Methods for Stochastic Composite Optimization , 2008 .

[4]  Guanghui Lan,et al.  Accelerated gradient sliding for structured convex optimization , 2016, Computational Optimization and Applications.

[5]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  Guanghui Lan,et al.  Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization , 2013, Mathematical Programming.

[8]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[9]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[10]  Guanghui Lan,et al.  Gradient sliding for composite optimization , 2014, Mathematical Programming.

[11]  Lin Xiao,et al.  MultiLevel Composite Stochastic Optimization via Nested Variance Reduction , 2019, SIAM J. Optim..

[12]  Mengdi Wang,et al.  Multilevel Stochastic Gradient Methods for Nested Composition Optimization , 2018, SIAM J. Optim..

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  Guanghui Lan,et al.  Efficient Algorithms for Distributionally Robust Stochastic Optimization with Discrete Scenario Support , 2019, SIAM J. Optim..

[15]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[16]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[17]  A. Ruszczynski A Stochastic Subgradient Method for Nonsmooth Nonconvex Multilevel Composition Optimization , 2020, SIAM J. Control. Optim..

[18]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[19]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[20]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[21]  Wotao Yin,et al.  Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization , 2020, ArXiv.

[22]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[23]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[24]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .