First-order and second-order variants of the gradient descent: a unified framework

In this paper, we provide an overview of first-order and second-order variants of the gradient descent methods commonly used in machine learning. We propose a general framework in which 6 of these methods can be interpreted as different instances of the same approach. These methods are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton's method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.

[1]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[4]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[5]  Yann Ollivier,et al.  Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.

[6]  Ilya Sutskever,et al.  Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[9]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[10]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[11]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[12]  Pascal Vincent,et al.  Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.

[13]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[14]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[15]  Youhei Akimoto,et al.  Objective improvement in information-geometric optimization , 2012, FOGA XII '13.

[16]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[17]  Olivier Sigaud,et al.  Policy Search in Continuous Action Domains: an Overview , 2018, Neural Networks.

[18]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.