Weight Groupings in Second Order Training Methods for Recurrent Networks

In this paper, we use block-diagonal matrix of approximate the Hessian matrix in the Levenberg Marquardt method during the training of recurrent neural networks. We analyze the weight updating strategies and the groupings of the weights associated with the approximation. Two weight updating strategies, namely asynchronous and synchronous updating methods are investigated. Asynchronous method updated weights of one block at a time while synchronous method updates all weights at the same time. Variations of these two methods, which involve the determination of two parameters mu and lambda, are examined. Four weight grouping methods, correlation blocks, k-unit blocks, layer blocks and arbitrary blocks are investigated and compared. Their computational complexity, approximation ability, and training time is analyzed. Comparing with the original Levenberg Marquardt method, the block-diagonal approximation methods give substantial improvement in training time without degrading the generalization ability.