Comparing gradient based learning methods for optimizing predictive neural networks

In this paper, we compare the performance of various gradient based techniques in optimizing the neural networks employed for prediction modeling. Training of neural network based predictive models is done using gradient based techniques, which involves searching for the point of minima on multidimensional energy function by providing step-wise corrective adjustment of weight vector present in hidden layers. Convergence of different gradient techniques is studied and compared by performing experiments in neural network toolbox package of MATLAB. Bulky data sets extracted from live data warehouse of life insurance sector are employed with gradient methods for developing the predictive models. Convergence behaviors of learning methods - gradient descent method, Levenberg Marquardt method, conjugate gradient method and scaled conjugate gradient method have been observed.

[1]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[2]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[3]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[4]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[5]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[6]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[7]  Carlos A. Reyes García,et al.  Detecting Pathologies from Infant Cry Applying Scaled Conjugated Gradient Neural Networks , 2003, ESANN.

[8]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[9]  Jasbir S. Arora,et al.  Jan A. Snyman, Practical Mathematical Optimization: An introduction to basic optimization theory and classical and new gradient-based algorithms , 2006 .

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[12]  Martin T. Hagan,et al.  Neural network design , 1995 .

[13]  Stanislaw Osowski,et al.  Fast Second Order Learning Algorithm for Feedforward Multilayer Neural Networks and its Applications , 1996, Neural Networks.

[14]  J. Meza,et al.  Steepest descent , 2010 .

[15]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.