Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization

We develop two new stochastic Gauss-Newton algorithms for solving a class of non-convex stochastic compositional optimization problems frequently arising in practice. We consider both the expectation and finite-sum settings under standard assumptions, and use both classical stochastic and SARAH estimators for approximating function values and Jacobians. In the expectation case, we establish $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity to achieve a stationary point in expectation and estimate the total number of stochastic oracle calls for both function value and its Jacobian, where $\varepsilon$ is a desired accuracy. In the finite sum case, we also estimate $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity and the total oracle calls with high probability. To our best knowledge, this is the first time such global stochastic oracle complexity is established for stochastic Gauss-Newton methods. Finally, we illustrate our theoretical results via two numerical examples on both synthetic and real datasets.

[1]  Damek Davis,et al.  Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems , 2017, SIAM J. Optim..

[2]  R. Tyrrell Rockafellar,et al.  Stochastic variational inequalities: single-stage to multistage , 2017, Math. Program..

[3]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[4]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[5]  T. Chan,et al.  Primal dual algorithms for convex models and applications to image restoration, registration and nonlocal inpainting , 2010 .

[6]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[7]  Lam M. Nguyen,et al.  ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..

[8]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[9]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[10]  Robert D. Tortora,et al.  Sampling: Design and Analysis , 2000 .

[11]  Moritz Diehl,et al.  Proximal methods for minimizing the sum of a convex function and a composite function , 2011 .

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Feng Ruan,et al.  Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..

[14]  Alexander Shapiro,et al.  Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[15]  J. Blanchet,et al.  Unbiased Simulation for Optimizing Stochastic Function Compositions , 2017, 1711.07564.

[16]  Marten van Dijk,et al.  Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH , 2019, ArXiv.

[17]  Junyu Zhang,et al.  Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization , 2020, Mathematical Programming.

[18]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[19]  Lin Xiao,et al.  MultiLevel Composite Stochastic Optimization via Nested Variance Reduction , 2019, SIAM J. Optim..

[20]  Musa A. Mammadov,et al.  From Convex to Nonconvex: A Loss Function Analysis for Binary Classification , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[21]  Quoc Tran-Dinh,et al.  Generalized self-concordant functions: a recipe for Newton-type methods , 2017, Mathematical Programming.

[22]  Mengdi Wang,et al.  Multilevel Stochastic Gradient Methods for Nested Composition Optimization , 2018, SIAM J. Optim..

[23]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[24]  Dmitriy Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[25]  Yue Yu,et al.  Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization , 2017, IJCAI.

[26]  Xiaoming Yuan,et al.  Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems , 2013, 1305.0546.

[27]  Zhiqiang Zhou,et al.  Algorithms for stochastic optimization with function or expectation constraints , 2016, Comput. Optim. Appl..

[28]  Guanghui Lan,et al.  Algorithms for stochastic optimization with expectation constraints , 2016, 1604.03887.

[29]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[30]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[31]  Liu Liu,et al.  Variance Reduced Methods for Non-Convex Composition Optimization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Volkan Cevher,et al.  A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization , 2015, SIAM J. Optim..

[33]  Yangyang Xu,et al.  Katyusha Acceleration for Convex Finite-Sum Compositional Optimization , 2019, INFORMS J. Optim..

[34]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[35]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[36]  Yurii Nesterov,et al.  Modified Gauss–Newton scheme with worst case guarantees for global performance , 2007, Optim. Methods Softw..

[37]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[38]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[39]  Q. Tran-Dinh Proximal Alternating Penalty Algorithms for Constrained Convex Optimization , 2017, 1711.01367.

[40]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.