MultiLevel Composite Stochastic Optimization via Nested Variance Reduction

We consider multi-level composite optimization problems where each mapping in the composition is the expectation over a family of random smooth mappings or the sum of some finite number of smooth mappings. We present a normalized proximal approximate gradient (NPAG) method where the approximate gradients are obtained via nested stochastic variance reduction. In order to find an approximate stationary point where the expected norm of its gradient mapping is less than $\epsilon$, the total sample complexity of our method is $O(\epsilon^{-3})$ in the expectation case, and $O(N+\sqrt{N}\epsilon^{-2})$ in the finite-sum case where $N$ is the total number of functions across all composition levels. In addition, the dependence of our total sample complexity on the number of composition levels is polynomial, rather than exponential as in previous work.

[1]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[2]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[3]  Angelia Nedic,et al.  Regularized Iterative Stochastic Approximation Methods for Stochastic Variational Inequality Problems , 2013, IEEE Transactions on Automatic Control.

[4]  SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms , 2018, 1810.10690.

[5]  Lan Wang,et al.  Sparse Concordance-assisted Learning for Optimal Treatment Decision , 2018, J. Mach. Learn. Res..

[6]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[7]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[8]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[10]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[11]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[12]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[13]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[14]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[15]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[16]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[17]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[18]  Alfredo N. Iusem,et al.  Extragradient Method with Variance Reduction for Stochastic Variational Inequalities , 2017, SIAM J. Optim..

[19]  Saeed Ghadimi,et al.  A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..

[20]  Lam M. Nguyen,et al.  ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..

[21]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[22]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[23]  Marten van Dijk,et al.  Finite-sum smooth optimization with SARAH , 2019, Computational Optimization and Applications.

[24]  Andrzej Ruszczyński,et al.  Advances in Risk-Averse Optimization , 2013 .

[25]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[26]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[27]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[28]  J. Blanchet,et al.  Unbiased Simulation for Optimizing Stochastic Function Compositions , 2017, 1711.07564.

[29]  Heng Huang,et al.  Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization , 2017, AAAI.

[30]  A. Ruszczynski,et al.  Statistical estimation of composite risk functionals and risk optimization problems , 2015, 1504.02658.

[31]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[32]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[33]  Mengdi Wang,et al.  Multilevel Stochastic Gradient Methods for Nested Composition Optimization , 2018, SIAM J. Optim..

[34]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[35]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[36]  Lin Xiao,et al.  A Composite Randomized Incremental Gradient Method , 2019, ICML.

[37]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[38]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[41]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[42]  Zeyuan Allen-Zhu,et al.  Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[43]  R. Tyrrell Rockafellar,et al.  Coherent Approaches to Risk in Optimization Under Uncertainty , 2007 .