Evaluation of constructive neural networks with cascaded architectures

Abstract In this study, we have investigated five different constructive neural network algorithms, of which four were methods found in the literature and one was our own recently developed algorithm. The algorithms that were studied were Cascade-Correlation, Modified Cascade-Correlation, Cascade, Cascade Network, and our own recently developed Fixed Cascade Error. The investigated algorithms have many similarities: they all have a cascaded architecture and they automatically increase the size of the neural network by adding new hidden units to the network as the training proceeds. Furthermore, the networks are trained in a layer-by-layer style, i.e. as the hidden units are installed in the network, their input weights are frozen so that they do not change in the later stages of the network training. The basic versions of the algorithms (which use only one randomly initialized candidate unit in the hidden unit training) were improved during the course of this research by adding a deterministic initialization method and the utilization of multiple candidate units in the training phase of the hidden units. The key idea of the deterministic initialization method is to create a large pool of randomly initialized hidden units, of which only the best unit is further trained and installed in the network. On the other hand, when we utilize multiple candidate units, we train a number of candidate units to the final solution, after which the best one of them is selected to be installed as a hidden unit in the active network. The numerical simulations show that especially the multiple candidate unit versions of the algorithms produce usually better results than the basic versions of the algorithms. In addition, the computational costs of the algorithms do not increase when using the deterministic initialization method, but in most cases we can even reduce the computational costs needed for the network training. Moreover, it should be noticed that our own algorithm produces rather often the best performance level among the investigated algorithms.

[1]  Christopher M. Bishop,et al.  Regularization and complexity control in feed-forward networks , 1995 .

[2]  JANI LAHNAJÄRVI,et al.  Forward Selection Initialization Method for Constructive Neural Networks , 2000 .

[3]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Don R. Hush,et al.  Error surfaces for multilayer perceptrons , 1992, IEEE Trans. Syst. Man Cybern..

[6]  Jukka Saarinen,et al.  Initializing Weights of a Multilayer Perceptron Network by Using the Orthogonal Least Squares Algorithm , 1995, Neural Computation.

[7]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[8]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .

[9]  James T. Kwok,et al.  Objective functions for training new hidden units in constructive neural networks , 1997, IEEE Trans. Neural Networks.

[10]  Helge Ritter,et al.  Cascade LLM Networks , 1992 .

[11]  E. Fiesler,et al.  Comparative Bibliography of Ontogenic Neural Networks , 1994 .

[12]  Martin A. Riedmiller,et al.  Rprop - Description and Implementation Details , 1994 .

[13]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[14]  E. Littmann Generalization Abilities of Cascade Network Architectures , 1992 .

[15]  James T. Kwok,et al.  Experimental analysis of input weight freezing in constructive neural networks , 1993, IEEE International Conference on Neural Networks.

[16]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[17]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[18]  Kagan Tumer,et al.  Structural adaptation and generalization in supervised feed-forward networks , 1994 .

[19]  Fred Joseph Gruenberger,et al.  Computing: An Introduction , 1969 .

[20]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[21]  James T. Kwok,et al.  Bayesian Regularization in Constructive Neural Networks , 1996, ICANN.

[22]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[23]  Raúl Rojas Optimal weight initialization for neural networks , 1994 .

[24]  Emile Fiesler,et al.  High-order and multilayer perceptron initialization , 1997, IEEE Trans. Neural Networks.

[25]  Mikko Lehtokangas,et al.  Fast initialization for cascade-correlation learning , 1999, IEEE Trans. Neural Networks.

[26]  Thomas Jackson,et al.  Neural Computing - An Introduction , 1990 .

[27]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Robert M. Pap,et al.  Handbook of neural computing applications , 1990 .

[29]  D. Yeung,et al.  Constructive feedforward neural networks for regression problems : a survey , 1995 .

[30]  Lutz Prechelt,et al.  Investigation of the CasCor Family of Learning Algorithms , 1997, Neural Networks.

[31]  Tamás D. Gedeon,et al.  Extending CasPer: A Regression Survey , 1997, ICONIP.

[32]  Jukka Saarinen,et al.  Weight initialization with reference patterns , 1998, Neurocomputing.

[33]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[34]  Dit-Yan Yeung,et al.  Constructive neural networks: some practical considerations , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[35]  Helge J. Ritter,et al.  Generalization Abilities of Cascade Network Architecture , 1992, NIPS.

[36]  David M. Skapura,et al.  Neural networks - algorithms, applications, and programming techniques , 1991, Computation and neural systems series.

[37]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[38]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[39]  F. Smieja Neural network constructive algorithms: Trading generalization for learning efficiency? , 1993 .