Efficacy of modified backpropagation and optimisation methods on a real-world medical problem

A wide range of modifications to the backpropagation (BP) algorithm, motivated by heuristic arguments and optimisation theory, has been examined on a real-world medical signal classification problem. The method of choice depends both upon the nature of the learning task and whether one wants to optimise learning for speed or generalisation. It was found that, comparatively, standard BP was sufficiently fast and provided good generalisation when the task was to learn the training set within a given error tolerance. However, if the task was to find the global minimum, then standard BP failed to do so within 100000 iterations, but first order methods which adapt the stepsize were as fast as, if not faster than, conjugate gradient and quasi-Newton methods. Second order methods required the same amount of fine tuning of line search and restart parameters as did the first order methods of their parameters in order to achieve optimum performance.

[1]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[2]  Philip E. Gill,et al.  Practical optimization , 1981 .

[3]  Pierre Roussel-Ragot,et al.  Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms , 1993, Neural Computation.

[4]  Jianqiang Yi,et al.  Backpropagation based on the logarithmic error function and elimination of local minima , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[5]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[6]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[7]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[8]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[9]  P. Lisboa,et al.  Complete solution of the local minima in the XOR problem , 1991 .

[10]  O. Ozdamar,et al.  Determining hearing threshold from brain stem evoked potentials. Optimizing a neural network to improve classification performance , 1994, IEEE Engineering in Medicine and Biology Magazine.

[11]  Han Wen,et al.  Auditory brainstem response classification using modular neural networks , 1991, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society Volume 13: 1991.

[12]  Etienne Barnard,et al.  A comparative study of optimization techniques for backpropagation , 1994, Neurocomputing.

[13]  Alexander H. Waibel,et al.  A novel objective function for improved phoneme recognition using time delay neural networks , 1990, International 1989 Joint Conference on Neural Networks.

[14]  Claas de Groot,et al.  'Plain backpropagation' and advanced optimization algorithms: A comparative study , 1994, Neurocomputing.

[15]  Barak A. Pearlmutter,et al.  Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[16]  Gerald Tesauro,et al.  Scaling Relationships in Back-propagation Learning , 1988, Complex Syst..

[17]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[19]  Özcan Özdamar,et al.  Auditory brainstem evoked potential classification for threshold detection by neural networks. II. Effects of input coding, training set size and composition and network size on performance , 1992 .

[20]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[21]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[22]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[23]  David F. Shanno,et al.  Conjugate Gradient Methods with Inexact Searches , 1978, Math. Oper. Res..

[24]  Özcan Özdamar,et al.  Auditory brainstem evoked potential classification for threshold detection by neural networks. I. Network design, similarities between human-expert and network classification, feasibility , 1992 .

[25]  Roberto Battiti,et al.  Learning with first, second, and no derivatives: A case study in high energy physics , 1994, Neurocomputing.

[26]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[27]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[28]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[29]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[30]  Randy L. Shimabukuro,et al.  Back propagation learning with trinary quantization of weight updates , 1991, Neural Networks.

[31]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[32]  Alex Pentland,et al.  Analysis of Neural Networks with Redundancy , 1990, Neural Computation.

[33]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[34]  M.J.J. Holt,et al.  Convergence of back-propagation in neural networks using a log-likelihood cost function , 1990 .

[35]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[36]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[37]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[38]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[39]  David E. Rumelhart,et al.  BACK-PROPAGATION, WEIGHT-ELIMINATION AND TIME SERIES PREDICTION , 1991 .

[40]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.