A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization

We study constrained nested stochastic optimization problems in which the objective function is a composition of two smooth functions whose exact values and derivatives are not available. We propose a single time-scale stochastic approximation algorithm, which we call the Nested Averaged Stochastic Approximation (NASA), to find an approximate stationary point of the problem. The algorithm has two auxiliary averaged sequences (filters) which estimate the gradient of the composite objective function and the inner function value. By using a special Lyapunov function, we show that NASA achieves the sample complexity of ${\cal O}(1/\epsilon^{2})$ for finding an $\epsilon$-approximate stationary point, thus outperforming all extant methods for nested stochastic approximation. Our method and its analysis are the same for both unconstrained and constrained problems, without any need of batch samples for constrained nonconvex stochastic optimization. We also present a simplified variant of the NASA method for solving constrained single level stochastic optimization problems, and we prove the same complexity result for both unconstrained and constrained problems.

[1]  A. Ruszczynski,et al.  Stochastic approximation method with gradient averaging for unconstrained problems , 1983 .

[2]  Andrzej Ruszczynski,et al.  A Linearization Method for Nonsmooth Stochastic Programming Problems , 1987, Math. Oper. Res..

[3]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[4]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[5]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[6]  Angelia Nedic,et al.  Regularized Iterative Stochastic Approximation Methods for Stochastic Variational Inequality Problems , 2013, IEEE Transactions on Automatic Control.

[7]  Yuri M. Ermoliev,et al.  Sample Average Approximation Method for Compound Stochastic Optimization Problems , 2013, SIAM J. Optim..

[8]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[9]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[10]  A. Ruszczynski,et al.  Statistical estimation of composite risk functionals and risk optimization problems , 2015, 1504.02658.

[11]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[12]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[13]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[14]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[15]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[16]  Le Song,et al.  Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.

[17]  Alfredo N. Iusem,et al.  Extragradient Method with Variance Reduction for Stochastic Variational Inequalities , 2017, SIAM J. Optim..

[18]  J. Blanchet,et al.  Unbiased Simulation for Optimizing Stochastic Function Compositions , 2017, 1711.07564.

[19]  Mingrui Liu,et al.  Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Weakly-Monotone Variational Inequality , 2018 .

[20]  Ethan X. Fang,et al.  Multi-Level Stochastic Gradient Methods for Nested Compositon Optimization , 2018 .

[21]  Mingrui Liu,et al.  Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Successive Strongly Monotone Variational Inequalities , 2018 .

[22]  Damek Davis,et al.  Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems , 2017, SIAM J. Optim..

[23]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[24]  Liu Liu,et al.  Dualityfree Methods for Stochastic Composition Optimization , 2017, IEEE Transactions on Neural Networks and Learning Systems.