论文信息 - A Variance Reduced Stochastic Newton Method - 字舞流文

A Variance Reduced Stochastic Newton Method

Quasi-Newton methods are widely used in practise for convex loss minimization problems. These methods exhibit good empirical performance on a wide variety of tasks and enjoy super-linear convergence to the optimal solution. For large-scale learning problems, stochastic Quasi-Newton methods have been recently proposed. However, these typically only achieve sub-linear convergence rates and have not been shown to consistently perform well in practice since noisy Hessian approximations can exacerbate the effect of high-variance stochastic gradient estimates. In this work we propose Vite, a novel stochastic Quasi-Newton algorithm that uses an existing first-order technique to reduce this variance. Without exploiting the specific form of the approximate Hessian, we show that Vite reaches the optimum at a geometric rate with a constant step-size when dealing with smooth strongly convex functions. Empirically, we demonstrate improvements over existing stochastic Quasi-Newton and variance reduced stochastic gradient methods.

Thomas Hofmann | Aurélien Lucchi | Brian McWilliams | Thomas Hofmann | Aurélien Lucchi | Brian McWilliams | B. McWilliams

[1] Samy Bengio,et al. A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[2] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[3] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5] J. J. Moré,et al. Quasi-Newton Methods, Motivation and Theory , 1974 .

[6] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[7] H. Robbins. A Stochastic Approximation Method , 1951 .

[8] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.

[9] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.

[10] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[13] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[14] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[15] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[16] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[17] Peter Richtárik,et al. Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[18] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[19] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..