RES: Regularized Stochastic BFGS Algorithm

RES, a regularized stochastic version of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method, is proposed to solve strongly convex optimization problems with stochastic objectives. The use of stochastic gradient descent algorithms is widespread, but the number of iterations required to approximate optimal arguments can be prohibitive in high dimensional problems. Application of second-order methods, on the other hand, is impracticable because the computation of objective function Hessian inverses incurs excessive computational cost. BFGS modifies gradient descent by introducing a Hessian approximation matrix computed from finite gradient differences. RES utilizes stochastic gradients in lieu of deterministic gradients for both the determination of descent directions and the approximation of the objective function's curvature. Since stochastic gradients can be computed at manageable computational cost, RES is realizable and retains the convergence rate advantages of its deterministic counterparts. Convergence results show that lower and upper bounds on the Hessian eigenvalues of the sample functions are sufficient to guarantee almost sure convergence of a subsequence generated by RES and convergence of the sequence in expectation to optimal arguments. Numerical experiments showcase reductions in convergence time relative to stochastic gradient descent algorithms and non-regularized stochastic versions of BFGS. An application of RES to the implementation of support vector machines is developed.

[1]  John E. Dennis,et al.  On the Local and Superlinear Convergence of Quasi-Newton Methods , 1973 .

[2]  J. J. Moré,et al.  A Characterization of Superlinear Convergence and its Application to Quasi-Newton Methods , 1973 .

[3]  A. Ruszczynski,et al.  Stochastic approximation method with gradient averaging for unconstrained problems , 1983 .

[4]  J. Nocedal,et al.  Global Convergence of a Class of Quasi-newton Methods on Convex Problems, Siam Some Global Convergence Properties of a Variable Metric Algorithm for Minimization without Exact Line Searches, Nonlinear Programming, Edited , 1996 .

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[7]  Xuan Kong,et al.  Adaptive Signal Processing Algorithms: Stability and Performance , 1994 .

[8]  L. Qi,et al.  A Stochastic Newton Method for Stochastic Quadratic Programs with Recourse , 1995 .

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[11]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[12]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[15]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[16]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[17]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[18]  Alejandro Ribeiro,et al.  Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking , 2010, IEEE Transactions on Signal Processing.

[19]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[20]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[21]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[22]  Alejandro Ribeiro,et al.  Optimal resource allocation in wireless communication and networking , 2012, EURASIP Journal on Wireless Communications and Networking.

[23]  Aryan Mokhtari,et al.  A dual stochastic DFP algorithm for optimal resource allocation in wireless systems , 2013, 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[24]  Rong Jin,et al.  Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[25]  Aryan Mokhtari,et al.  Regularized stochastic BFGS algorithm , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[26]  Alejandro Ribeiro,et al.  Accelerated backpressure algorithm , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[27]  Aryan Mokhtari,et al.  A quasi-Newton method for large scale support vector machines , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).