Adaptive Stochastic Optimization: A Framework for Analyzing Stochastic Optimization Algorithms

Optimization lies at the heart of machine learning (ML) and signal processing (SP). Contemporary approaches based on the stochastic gradient (SG) method are nonadaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the potential to offer significant computational savings when training largescale systems.

[1]  Aaron Mishkin,et al.  Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.

[2]  Katya Scheinberg,et al.  Global convergence rate analysis of unconstrained optimization methods based on probabilistic models , 2015, Mathematical Programming.

[3]  John C. Duchi,et al.  The importance of better models in stochastic optimization , 2019, Proceedings of the National Academy of Sciences.

[4]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5]  Katya Scheinberg,et al.  A Stochastic Line Search Method with Expected Complexity Analysis , 2020, SIAM J. Optim..

[6]  Rui Shi,et al.  A Stochastic Trust Region Algorithm Based on Careful Step Normalization , 2017, INFORMS J. Optim..

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Peter W. Glynn,et al.  On Sampling Rates in Simulation-Based Recursions , 2018, SIAM J. Optim..

[9]  Flagot Yohannes Derivative free optimization methods , 2012 .

[10]  Katya Scheinberg,et al.  Stochastic optimization using a trust-region method and random models , 2015, Mathematical Programming.

[11]  Philipp Hennig,et al.  Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.

[12]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[13]  Katya Scheinberg,et al.  Convergence Rate Analysis of a Stochastic Trust-Region Method via Supermartingales , 2016, INFORMS Journal on Optimization.

[14]  Jorge Nocedal,et al.  Adaptive Sampling Strategies for Stochastic Optimization , 2017, SIAM J. Optim..

[15]  Michael I. Jordan,et al.  Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[16]  Katya Scheinberg,et al.  Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning , 2017, ArXiv.

[17]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[18]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  Peng Xu,et al.  Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..

[21]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[22]  Jorge Nocedal,et al.  Sample size selection in optimization methods for machine learning , 2012, Math. Program..

[23]  L. N. Vicente,et al.  Complexity and global rates of trust-region methods based on probabilistic models , 2018 .

[24]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[25]  Mark W. Schmidt,et al.  Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..

[26]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[27]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..