论文信息 - Weight Groupings in Second Order Training Methods for Recurrent Networks

Weight Groupings in Second Order Training Methods for Recurrent Networks

In this paper, we use block-diagonal matrix of approximate the Hessian matrix in the Levenberg Marquardt method during the training of recurrent neural networks. We analyze the weight updating strategies and the groupings of the weights associated with the approximation. Two weight updating strategies, namely asynchronous and synchronous updating methods are investigated. Asynchronous method updated weights of one block at a time while synchronous method updates all weights at the same time. Variations of these two methods, which involve the determination of two parameters mu and lambda, are examined. Four weight grouping methods, correlation blocks, k-unit blocks, layer blocks and arbitrary blocks are investigated and compared. Their computational complexity, approximation ability, and training time is analyzed. Comparing with the original Levenberg Marquardt method, the block-diagonal approximation methods give substantial improvement in training time without degrading the generalization ability.

Lai-Wan Chan | Chi-Cheong Szeto

[1] S. Ragazzini,et al. Learning of word stress in a sub-optimal second order back-propagation neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[2] Stefanos Kollias,et al. An adaptive least squares algorithm for the efficient training of artificial neural networks , 1989 .

[3] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[4] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.