SUPERVISED TRAINING USING GLOBAL SEARCH METHODS

Supervised learning in neural networks based on the popular backpropagation method can be often trapped in a local minimum of the error function. The class of backpropagation-type training algorithms includes local minimization methods that have no mechanism that allows them to escape the influence of a local minimum. The existence of local minima is due to the fact that the error function is the superposition of nonlinear activation functions that may have minima at different points, which sometimes results in a nonconvex error function. This work investigates the use of global search methods for batch-mode training of feedforward multilayer perceptrons. Global search methods are expected to lead to “optimal” or “near-optimal” weight configurations by allowing the neural network to escape local minima during training and, in that sense, they improve the efficiency of the learning process. The paper reviews the fundamentals of simulated annealing, genetic and evolutionary algorithms as well as some recently proposed deflection procedures. Simulations and comparisons are presented.

[1]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[3]  Sandro Ridella,et al.  Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithmCorrigenda for this article is available here , 1987, TOMS.

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  X H Yu,et al.  On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.

[6]  Stephen T. Welstead,et al.  Neural network and fuzzy logic applications in C/C++ , 1994, Wiley professional computing.

[7]  Vassilis P. Plagianakos,et al.  Training neural networks with threshold activation functions and constrained integer weights , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[9]  George D. Magoulas,et al.  Neural network supervised training based on a dimension reducing method , 1997 .

[10]  Vassilis P. Plagianakos,et al.  Neural network training with constrained integer weights , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[11]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[12]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.

[13]  Sang-Hoon Oh,et al.  An analysis of premature saturation in back propagation learning , 1993, Neural Networks.

[14]  Michael N. Vrahatis,et al.  On the alleviation of the problem of local minima in back-propagation , 1997 .

[15]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[16]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[17]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[18]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[19]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[20]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[21]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[22]  Robert M. Burton,et al.  Event-dependent control of noise enhances learning in neural networks , 1992, Neural Networks.