Training of Large-Scale Feed-Forward Neural Networks

Neural processing of large-scale data sets containing both many input/output variables and a large number of training examples often leads to very large networks. Once these networks become large-scale in the truest sense of the word (several ten thousand weights), two major inconveniences -or possibly a little more than that -occur: (1) conventional training algorithms perform very poorly and common knowledge about them is potentially not valid anymore, and (2) training time and even more importantly memory limitations increasingly move into the focus of attention. Both issues are addressed within this paper by means of biomedical image segmentation based on supervised neural network classification of previously extracted image features.

[1]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[2]  Thomas Villmann,et al.  Supervised Neural Gas with General Similarity Measure , 2005, Neural Processing Letters.

[3]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[4]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[5]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[6]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[7]  Udo Seiffert,et al.  Multiple Layer Perceptron training using genetic algorithms , 2001, ESANN.

[8]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[9]  R. Brent Table errata: Algorithms for minimization without derivatives (Prentice-Hall, Englewood Cliffs, N. J., 1973) , 1975 .

[10]  Nicol N. Schraudolph,et al.  Towards stochastic conjugate gradient methods , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[11]  B. Michaelis,et al.  Directed random search for multiple layer perceptron training , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[12]  E. Polak,et al.  Note sur la convergence de méthodes de directions conjuguées , 1969 .

[13]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[14]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[15]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[16]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[17]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .