Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

We study the widely known Cubic-Newton method in the stochastic setting and propose a general framework to use variance reduction which we call the helper framework. In all previous work, these methods were proposed with very large batches (both in gradients and Hessians) and with various and often strong assumptions. In this work, we investigate the possibility of using such methods without large batches and use very simple assumptions that are sufficient for all our methods to work. In addition, we study these methods applied to gradient-dominated functions. In the general case, we show improved convergence (compared to first-order methods) to an approximate local minimum, and for gradient-dominated functions, we show convergence to approximate global minima.

[1]  El Mahdi Chayti,et al.  Second-order optimization with lazy Hessians , 2022, ArXiv.

[2]  Sai Praneeth Karimireddy,et al.  Optimization with access to auxiliary information , 2022, ArXiv.

[3]  Niao He,et al.  Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions , 2022, NeurIPS.

[4]  Yurii Nesterov,et al.  Minimizing Uniformly Convex Functions by Cubic Regularization of Newton Method , 2019, Journal of Optimization Theory and Applications.

[5]  P. Dvurechensky,et al.  Inexact tensor methods and their application to stochastic convex optimization , 2020, Optim. Methods Softw..

[6]  Peng Xu,et al.  Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..

[7]  Michael W. Mahoney,et al.  Newton-type methods for non-convex optimization under inexact Hessian information , 2019, Mathematical Programming.

[8]  Yi Zhou,et al.  Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization , 2018, AISTATS.

[9]  Quanquan Gu,et al.  Stochastic Variance-Reduced Cubic Regularization Methods , 2019, J. Mach. Learn. Res..

[10]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[11]  Karthik Sridharan,et al.  Uniform Convergence of Gradients for Non-Convex Learning and Optimization , 2018, NeurIPS.

[12]  Michael I. Jordan,et al.  Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[13]  Aurélien Lucchi,et al.  Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.

[14]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[15]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[16]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[17]  Peng Xu,et al.  Sub-sampled Newton Methods with Non-uniform Sampling , 2016, NIPS.

[18]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[19]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[20]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .