Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that only stochastic information of the gradients of the objective function is available via a stochastic first-order oracle (SFO). Firstly, we propose a general framework of stochastic quasi-Newton methods for solving nonconvex stochastic optimization. The proposed framework extends the classic quasi-Newton methods working in deterministic settings to stochastic settings, and we prove its almost sure convergence to stationary points. Secondly, we propose a general framework for a class of randomized stochastic quasi-Newton methods, in which the number of iterations conducted by the algorithm is a random variable. The worst-case SFO-calls complexities of this class of methods are analyzed. Thirdly, we present two specific methods that fall into this framework, namely stochastic damped-BFGS method and stochastic cyclic Barzilai-Borwein method. Finally, we report numerical results to demonstrate the efficiency of the proposed methods.

[1]  K. Chung On a Stochastic Approximation Method , 1954 .

[2]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[3]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[4]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[5]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[6]  O. Ibidapo-Obe A stochastic approximation method for estimation in nonlinear dynamical systems using the martingale concept (Ph.D. Thesis abstr.) , 1977, IEEE Trans. Inf. Theory.

[7]  Larry Nazareth,et al.  A family of variable metric updates , 1977, Math. Program..

[8]  A. A. Gaivoronskii Nonstationary stochastic programming problems , 1978 .

[9]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[10]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[11]  Y. Ermoliev Stochastic quasigradient methods and their application to system optimization , 1983 .

[12]  A. Ruszczynski,et al.  A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems , 1986 .

[13]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[14]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[15]  R. Durrett Probability: Theory and Examples , 1993 .

[16]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[17]  K. Train,et al.  Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles , 1999, Controlling Automobile Air Pollution.

[18]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[19]  David A. Hensher,et al.  The Mixed Logit Model: the State of Practice and Warnings for the Unwary , 2001 .

[20]  Michael C. Fu,et al.  Optimization for Simulation: Theory vs. Practice , 2002 .

[21]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[22]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[23]  Alexander V. Nazin,et al.  Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging , 2005, Probl. Inf. Transm..

[24]  Philippe L. Toint,et al.  Convergence theory for nonconvex stochastic programming with an application to mixed logit , 2006, Math. Program..

[25]  W. Hager,et al.  The cyclic Barzilai-–Borwein method for unconstrained optimization , 2006 .

[26]  Shiqian Ma,et al.  Projected Barzilai–Borwein method for large-scale nonnegative image restoration , 2007 .

[27]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[28]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[29]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[30]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[31]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[32]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[33]  William W. Hager,et al.  An affine-scaling interior-point CBB method for box-constrained optimization , 2009, Math. Program..

[34]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[35]  Ya-Xiang Yuan,et al.  Optimization Theory and Methods: Nonlinear Programming , 2010 .

[36]  Andrew W. Fitzgibbon,et al.  A fast natural Newton method , 2010, ICML.

[37]  Jorge Nocedal,et al.  On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..

[38]  Deng Cai,et al.  Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[39]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[40]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[41]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[42]  Alexander Shapiro,et al.  Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[43]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[44]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[45]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[46]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[47]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[48]  Xin Zhang,et al.  Gradient Type Optimization Methods For Electronic Structure Calculations , 2013, SIAM J. Sci. Comput..

[49]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[50]  Xiaojing Ye,et al.  Accelerated barrier optimization compressed sensing (ABOCS) for CT reconstruction with improved convergence. , 2014, Physics in medicine and biology.

[51]  Aryan Mokhtari,et al.  RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.

[52]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[53]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[54]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[55]  Aryan Mokhtari,et al.  Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[56]  Thomas Hofmann,et al.  A Variance Reduced Stochastic Newton Method , 2015, ArXiv.

[57]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[58]  Michael I. Jordan,et al.  A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[59]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[60]  Robert M. Gower,et al.  Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[61]  Jorge Nocedal,et al.  A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[62]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[63]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[64]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[65]  Shiqian Ma,et al.  Penalty methods with stochastic approximation for stochastic nonlinear programming , 2013, Math. Comput..