Rprop Using the Natural Gradient

Gradient-based optimization algorithms are the standard methods for adapting the weights of neural networks. The natural gradient gives the steepest descent direction based on a non-uclidean, from a theoretical point of view more appropriate metric in the weight space. While the natural gradient has already proven to be advantageous for online learning, we explore its benefits for batch learning: We empirically compare Rprop (resilient back-propagation), one of the best performing first-order learning algorithms, using the Euclidean and the non-Euclidean metric, respectively. As batch steepest descent on the natural gradient is closely related to Levenberg-Marquardt optimization, we add this method to our comparison.

[1]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[3]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[6]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[7]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[8]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[9]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .

[10]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[11]  Shun-ichi Amari,et al.  Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[13]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[14]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[15]  Tom Heskes,et al.  On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.

[16]  Christian Igel,et al.  Improving the Rprop Learning Algorithm , 2000 .

[17]  Marc Toussaint,et al.  On model selection and the disability of neural networks to decompose tasks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[18]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.