论文信息 - First-order and second-order variants of the gradient descent: a unified framework

First-order and second-order variants of the gradient descent: a unified framework

In this paper, we provide an overview of first-order and second-order variants of the gradient descent methods commonly used in machine learning. We propose a general framework in which 6 of these methods can be interpreted as different instances of the same approach. These methods are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton's method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.

Olivier Sigaud | Thomas Pierrot | Nicolas Perrin

[1] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[2] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[3] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[4] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[5] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.

[6] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[7] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[8] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[9] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[10] Shun-ichi Amari,et al. Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[11] Solomon Kullback,et al. Information Theory and Statistics , 1960 .