论文信息 - Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization - 字舞流文

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that only stochastic information of the gradients of the objective function is available via a stochastic first-order oracle (SFO). Firstly, we propose a general framework of stochastic quasi-Newton methods for solving nonconvex stochastic optimization. The proposed framework extends the classic quasi-Newton methods working in deterministic settings to stochastic settings, and we prove its almost sure convergence to stationary points. Secondly, we propose a general framework for a class of randomized stochastic quasi-Newton methods, in which the number of iterations conducted by the algorithm is a random variable. The worst-case SFO-calls complexities of this class of methods are analyzed. Thirdly, we present two specific methods that fall into this framework, namely stochastic damped-BFGS method and stochastic cyclic Barzilai-Borwein method. Finally, we report numerical results to demonstrate the efficiency of the proposed methods.

Shiqian Ma | Wei Liu | Xiao Wang | Donald Goldfarb | W. Liu | D. Goldfarb | Shiqian Ma | Xiao Wang

[1] K. Chung. On a Stochastic Approximation Method , 1954 .

[2] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[3] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[4] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[5] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[6] O. Ibidapo-Obe. A stochastic approximation method for estimation in nonlinear dynamical systems using the martingale concept (Ph.D. Thesis abstr.) , 1977, IEEE Trans. Inf. Theory.

[7] Larry Nazareth,et al. A family of variable metric updates , 1977, Math. Program..

[8] A. A. Gaivoronskii. Nonstationary stochastic programming problems , 1978 .

[9] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[10] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[11] Y. Ermoliev. Stochastic quasigradient methods and their application to system optimization , 1983 .

[12] A. Ruszczynski,et al. A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems , 1986 .

[13] J. Borwein,et al. Two-Point Step Size Gradient Methods , 1988 .

[14] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[15] R. Durrett. Probability: Theory and Examples , 1993 .

[16] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[17] K. Train,et al. Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles , 1999, Controlling Automobile Air Pollution.

[18] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.

[19] David A. Hensher,et al. The Mixed Logit Model: the State of Practice and Warnings for the Unwary , 2001 .

[20] Michael C. Fu,et al. Optimization for Simulation: Theory vs. Practice , 2002 .

[21] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[22] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[23] Alexander V. Nazin,et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging , 2005, Probl. Inf. Transm..

[24] Philippe L. Toint,et al. Convergence theory for nonconvex stochastic programming with an application to mixed logit , 2006, Math. Program..

[25] W. Hager,et al. The cyclic Barzilai-–Borwein method for unconstrained optimization , 2006 .

[26] Shiqian Ma,et al. Projected Barzilai–Borwein method for large-scale nonnegative image restoration , 2007 .

[27] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[28] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[29] A. Juditsky,et al. Learning by mirror averaging , 2005, math/0511468.

[30] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[31] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[32] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.

[33] William W. Hager,et al. An affine-scaling interior-point CBB method for box-constrained optimization , 2009, Math. Program..

[34] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[35] Ya-Xiang Yuan,et al. Optimization Theory and Methods: Nonlinear Programming , 2010 .

[36] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.

[37] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..

[38] Deng Cai,et al. Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[39] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[40] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[41] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[42] Alexander Shapiro,et al. Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[43] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[44] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[45] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[46] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[47] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[48] Xin Zhang,et al. Gradient Type Optimization Methods For Electronic Structure Calculations , 2013, SIAM J. Sci. Comput..

[49] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[50] Xiaojing Ye,et al. Accelerated barrier optimization compressed sensing (ABOCS) for CT reconstruction with improved convergence. , 2014, Physics in medicine and biology.

[51] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.

[52] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[53] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[54] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[55] Aryan Mokhtari,et al. Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[56] Thomas Hofmann,et al. A Variance Reduced Stochastic Newton Method , 2015, ArXiv.

[57] Guanghui Lan,et al. Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[58] Michael I. Jordan,et al. A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[59] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[60] Robert M. Gower,et al. Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[61] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[62] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[63] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[64] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[65] Shiqian Ma,et al. Penalty methods with stochastic approximation for stochastic nonlinear programming , 2013, Math. Comput..