论文信息 - Natural Gradient Descent for Training Multi-Layer Perceptrons

Natural Gradient Descent for Training Multi-Layer Perceptrons

The main diiculty in implementing the natural gradient learning rule is to compute the inverse of the Fisher information matrix when the input dimension is large. We have found a new scheme to represent the Fisher information matrix. Based on this scheme, we have designed an algorithm to compute the inverse of the Fisher information matrix. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algorithm is of order O(n 2) while the complexity of conventional algorithms for the same purpose is of order O(n 3). The simulation has connrmed the eecience and robustness of the natural gradient learning rule.

Howard Hua Yang | Shun-ichi Amar | H. Yang | S. Amar

[1] G. Stewart. Introduction to matrix computations , 1973 .

[2] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[3] M. Kendall,et al. Kendall's advanced theory of statistics , 1995 .

[4] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[5] John E. Moody,et al. Towards Faster Stochastic Gradient Search , 1991, NIPS.

[6] M. Murray,et al. Differential Geometry and Statistics , 1993 .

[7] Shun-ichi Amari,et al. Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[8] Jean-François Cardoso,et al. Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..

[9] Shun-ichi Amari,et al. Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[10] Klaus-Robert Müller,et al. Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[11] S. Amari. Natural Gradient Works Eciently in Learning , 2022 .