Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming

In this paper, we introduce a new stochastic approximation (SA) type algorithm, namely the randomized stochastic gradient (RSG) method, for solving an important class of nonlinear (possibly nonconvex) stochastic programming (SP) problems. We establish the complexity of this method for computing an approximate stationary point of a nonlinear programming problem. We also show that this method possesses a nearly optimal rate of convergence if the problem is convex. We discuss a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and show that such modification allows to improve significantly the large-deviation properties of the algorithm. These methods are then specialized for solving a class of simulation-based optimization problems in which only stochastic zeroth-order information is available.

[1]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[2]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[3]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[4]  P. L’Ecuyer,et al.  A Unified View of the IPA, SF, and LR Gradient Estimation Techniques , 1990 .

[5]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[6]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[7]  C. Leake Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method , 1994 .

[8]  Jason H. Goodfriend,et al.  Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method , 1995 .

[9]  Randall P. Sadowski,et al.  Simulation with Arena , 1998 .

[10]  Sigrún Andradóttir,et al.  A review of simulation optimization techniques , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[11]  R. Tyrrell Rockafellar,et al.  Variational Analysis , 1998, Grundlehren der mathematischen Wissenschaften.

[12]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[13]  Alexander Shapiro,et al.  The Sample Average Approximation Method for Stochastic Discrete Optimization , 2002, SIAM J. Optim..

[14]  Michael C. Fu,et al.  Optimization for Simulation: Theory vs. Practice , 2002 .

[15]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[16]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  Alexander V. Nazin,et al.  Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging , 2005, Probl. Inf. Transm..

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[21]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[22]  Serge Gratton,et al.  Recursive Trust-Region Methods for Multiscale Nonlinear Optimization , 2008, SIAM J. Optim..

[23]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[24]  A. Juditsky,et al.  Large Deviations of Vector-valued Martingales in 2-Smooth Normed Spaces , 2008, 0809.0813.

[25]  Yurii Nesterov,et al.  Confidence level solutions for stochastic programming , 2000, Autom..

[26]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[27]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[28]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[29]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[30]  Nicholas I. M. Gould,et al.  On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..

[31]  Mark W. Schmidt,et al.  A Hybrid Stochastic-deterministic Optimization Method for Waveform Inversion , 2011 .

[32]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[33]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[34]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[35]  Angelia Nedic,et al.  On stochastic gradient and subgradient methods with adaptive steplength sequences , 2011, Autom..

[36]  Nicholas I. M. Gould,et al.  On the Oracle Complexity of First-Order and Derivative-Free Algorithms for Smooth Nonconvex Minimization , 2012, SIAM J. Optim..

[37]  Alexander Shapiro,et al.  Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[38]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[39]  Luís Nunes Vicente,et al.  Worst case complexity of direct search , 2013, EURO J. Comput. Optim..

[40]  L. N. Vicente,et al.  Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization , 2013 .