Implementing online natural gradient learning: problems and solutions

The online natural gradient learning is an efficient algorithm to resolve the slow learning speed and poor performance of the standard gradient descent method. However, there are several problems to implement this algorithm. In this paper, we proposed a new algorithm to solve these problems and then compared the new algorithm with other known algorithms for online learning, including Almeida-Langlois-Amaral-Plakhov algorithm (ALAP), Vario-eta, local adaptive learning rate and learning with momentum etc., using sample data sets from Proben1 and normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Services. The strong and weak points of these algorithms were analyzed and tested empirically. We found out that using the online training error as the criterion to determine whether the learning rate should be changed or not is not appropriate and our new algorithm has better performance than other existing online algorithms.

[1]  Tom Heskes,et al.  On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.

[2]  Ralph Neuneier,et al.  How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[3]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[4]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[7]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[8]  Heinrich Braun,et al.  Neuronale Netze - Optimierung durch Lernen und Evolution , 1997 .

[9]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[10]  Todd K. Leen,et al.  Using Curvature Information for Fast Stochastic Search , 1996, NIPS.

[11]  Shun-ichi Amari,et al.  Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.

[12]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[13]  Lutz Prechelt,et al.  A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[16]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[17]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[18]  Magnus Rattray,et al.  Natural gradient descent for on-line learning , 1998 .

[19]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[20]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[21]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[22]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[23]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[24]  William H. Press,et al.  Numerical recipes in C , 2002 .

[25]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[26]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[27]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[28]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[29]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[30]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[31]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.