Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization

Stochastic compositional optimization generalizes classic (non-compositional) stochastic optimization to the minimization of compositions of functions. Each composition may introduce an additional expectation. The series of expectations may be nested. Stochastic compositional optimization is gaining popularity in applications such as reinforcement learning and meta learning. This paper presents a new Stochastically Corrected Stochastic Compositional gradient method (SCSC). SCSC runs in a single-time scale with a single loop, uses a fixed batch size, and guarantees to converge at the same rate as the stochastic gradient descent (SGD) method for non-compositional stochastic optimization. This is achieved by making a careful improvement to a popular stochastic compositional gradient method. It is easy to apply SGD-improvement techniques to accelerate SCSC. This helps SCSC achieve state-of-the-art performance for stochastic compositional optimization. In particular, we apply Adam to SCSC, and the exhibited rate of convergence matches that of the original Adam on non-compositional stochastic optimization. We test SCSC using the model-agnostic meta-learning tasks.

[1]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[2]  Yue Yu,et al.  Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization , 2017, IJCAI.

[3]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[4]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[5]  Yifan Hu,et al.  Biased Stochastic Gradient Descent for Conditional Stochastic Optimization , 2020, ArXiv.

[6]  Yingbin Liang,et al.  Multi-Step Model-Agnostic Meta-Learning: Convergence and Improved Algorithms , 2020, ArXiv.

[7]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[8]  Xiangru Lian,et al.  Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent , 2019, NeurIPS.

[9]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Wenbo Gao,et al.  ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[12]  A. Ozdaglar,et al.  Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning , 2020, ArXiv.

[13]  Andrzej Ruszczynski,et al.  A Stochastic Subgradient Method for Nonsmooth Nonconvex Multilevel Composition Optimization , 2020, SIAM J. Control. Optim..

[14]  Saeed Ghadimi,et al.  A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..

[15]  Lin Xiao,et al.  A Composite Randomized Incremental Gradient Method , 2019, ICML.

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[18]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[19]  Shuicheng Yan,et al.  Efficient Meta Learning via Minibatch Proximal Update , 2019, NeurIPS.

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Heng Huang,et al.  Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization , 2017, AAAI.

[22]  Aryan Mokhtari,et al.  On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms , 2019, AISTATS.

[23]  Yangyang Xu,et al.  Katyusha Acceleration for Convex Finite-Sum Compositional Optimization , 2019, INFORMS J. Optim..

[24]  Michael I. Jordan,et al.  Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient , 2020, 2020 American Control Conference (ACC).

[25]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[26]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[27]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[28]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[29]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[30]  Joonhyuk Kang,et al.  From Learning to Meta-Learning: Reduced Training Overhead and Complexity for Communication Systems , 2020, 2020 2nd 6G Wireless Summit (6G SUMMIT).

[31]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[32]  Adithya M. Devraj,et al.  Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization , 2019, NeurIPS.

[33]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[34]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[35]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[36]  Quanquan Gu,et al.  Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..

[37]  Haitham Bou-Ammar,et al.  Compositional ADAM: An Adaptive Compositional Solver , 2020, ArXiv.

[38]  J. Blanchet,et al.  Unbiased Simulation for Optimizing Stochastic Function Compositions , 2017, 1711.07564.

[39]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[40]  Maria-Florina Balcan,et al.  Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[41]  Richard Socher,et al.  Taming MAML: Efficient unbiased meta-reinforcement learning , 2019, ICML.