Adaptive learning rate for increasing learning speed in backpropagation networks

Backpropagation is a supervised learning algorithm for training multi-layer neural networks for function approximation and pattern classification by minimizing a suitably defined error metric (e.g., the mean square error between the desired and actual outputs of the network for a training set) using gradient descent. It does this by calculating the partial derivative of the overall error and changing each weight by a small amount (determined by the learning rate) in a direction that is expected to reduce the error. Despite its success on a number of real-world problems, backpropagation can be very slow (it requires hundreds of passes (epochs) through the training set). Also, its performance is extremely sensitive to the choice of parameters such as the learning rate. The mathematical considerations that go into the derivation of backpropagation require that the learning rate be as small as possible. On the other hand, in order reduce the number training epochs required to learn the desired input-output mapping, it is desirable to make the weight changes as large as possible without causing the error to increase. Also it is desirable to have an algorithm that can change the learning rate dynamically so that it is close to optimal (in terms of reducing the error as much as possible given the local gradient information) at each epoch (thereby eliminating the reliance on guesswork in the choice of the learning rate). Several authors have proposed methods for adaptively adjusting the learning rate based on certain assumptions about the shape of the error surface (e.g., quadratic) and/or estimation of higher order derivatives of the error with respect to the weights at a given point in the weight space. The primary disadvantage of such methods is that the estimation of second order derivatives is not only computationally expensive but also prone to inaccuracy due to the approximation of derivatives by discrete differences. In this paper we propose and evaluate a heuristically motivated method for adaptive modification of the learning rate in backpropagation that does not require the estimation of higher order derivatives. We present a modified version of the Backpropagation learning algorithm which uses a simple heuristic to come up with a learning parameter value at each epoch. We present numerous simulations on real-world data sets, using our modified algorithm. We compare these results with results got through standard backpropagation learning algorithm, and also various modifications of the standard backpropagation algorithm (e.g., flat-spot elimination methods) that have been discussed in the literature. Our simulation results suggest that the adaptive learning rate modification helps substantially speed up the convergence of backpropagation algorithm. Furthermore, it makes the initial choice of the learning rate fairly unimportant as our method allows the learning rate to change and settle at a reasonable value for the specific problem. As a standard against which to compare our results, we computed the quasi-optimal value of the learning parameter at each epoch. Simulation results indicate that our heuristic modification matches the performance of backpropagation with the quasi-optimal learning rate. The computational complexity of this algorithm is analysed and compared with that of standard backpropagation.