Exploring constructive cascade networks

Constructive algorithms have proved to be powerful methods for training feedforward neural networks. An important property of these algorithms is generalization. A series of empirical studies were performed to examine the effect of regularization on generalization in constructive cascade algorithms. It was found that the combination of early stopping and regularization resulted in better generalization than the use of early stopping alone. A cubic penalty term that greatly penalizes large weights was shown to be beneficial for generalization in cascade networks. An adaptive method of setting the regularization magnitude in constructive algorithms was introduced and shown to produce generalization results similar to those obtained with a fixed, user-optimized regularization setting. This adaptive method also resulted in the construction of smaller networks for more complex problems. The acasper algorithm, which incorporates the insights obtained from the empirical studies, was shown to have good generalization and network construction properties. This algorithm was compared to the cascade correlation algorithm on the Proben 1 and additional regression data sets.

[1]  Jenq-Neng Hwang,et al.  The cascade-correlation learning: a projection pursuit learning perspective , 1996, IEEE Trans. Neural Networks.

[2]  Lutz Prechelt,et al.  Investigation of the CasCor Family of Learning Algorithms , 1997, Neural Networks.

[3]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[4]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[5]  Jenq-Neng Hwang,et al.  Extensions to projection pursuit learning networks with parametric smoothers , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[6]  James T. Kwok,et al.  Experimental analysis of input weight freezing in constructive neural networks , 1993, IEEE International Conference on Neural Networks.

[7]  Tamás D. Gedeon,et al.  A Cascade Network Algorithm Employing Progressive RPROP , 1997, IWANN.

[8]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[9]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[10]  James T. Kwok,et al.  Objective functions for training new hidden units in constructive neural networks , 1997, IEEE Trans. Neural Networks.

[11]  Lutz Prechelt,et al.  A quantitative study of experimental evaluations of neural network learning algorithms: Current research practice , 1996, Neural Networks.

[12]  Yoshio Hirose,et al.  Backpropagation algorithm which varies the number of hidden units , 1989, International 1989 Joint Conference on Neural Networks.

[13]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[14]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[15]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[16]  J. H. Torrie,et al.  Principles and procedures of statistics: McGraw-Hill Book Company, Inc. New York Toronto London. , 1960 .

[17]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[18]  James T. Kwok,et al.  Bayesian Regularization in Constructive Neural Networks , 1996, ICANN.

[19]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[20]  Tamás D. Gedeon,et al.  Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm , 1998, IEEE Trans. Neural Networks.

[21]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[22]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[23]  Tamás D. Gedeon,et al.  Extending CasPer: A Regression Survey , 1997, ICONIP.

[24]  Eric B. Bartlett,et al.  Dynamic node architecture learning: An information theoretic approach , 1994, Neural Networks.

[25]  Joydeep Ghosh,et al.  Ridge polynomial networks , 1995, IEEE Trans. Neural Networks.

[26]  Tamás D. Gedeon,et al.  Adaptive Regularization in a Constructive Cascade Network , 1998, ICONIP.

[27]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[28]  Dit-Yan Yeung,et al.  Use of bias term in projection pursuit learning improves approximation and convergence properties , 1996, IEEE Trans. Neural Networks.

[29]  Rudy Setiono,et al.  Use of a quasi-Newton method in a feedforward neural network construction algorithm , 1995, IEEE Trans. Neural Networks.

[30]  Pierre Courrieu A convergent generator of neural networks , 1993, Neural Networks.

[31]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .