论文信息 - Nesterov's accelerated gradient and momentum as approximations to regularised update descent

Nesterov's accelerated gradient and momentum as approximations to regularised update descent

We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods. As natural special cases we re-derive classical momentum and Nesterov's accelerated gradient method, lending a new intuitive interpretation to the latter algorithm. We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nesterov's algorithm or the classical momentum algorithm.

[1] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[3] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[4] P. Hammond,et al. Essential Mathematics for Economic Analysis , 2002 .

[5] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[7] Yann Ollivier,et al. Speed learning on the fly , 2015, ArXiv.