Objective functions for training new hidden units in constructive neural networks

In this paper, we study a number of objective functions for training new hidden units in constructive algorithms for multilayer feedforward networks. The aim is to derive a class of objective functions the computation of which and the corresponding weight updates can be done in O(N) time, where N is the number of training patterns. Moreover, even though input weight freezing is applied during the process for computational efficiency, the convergence property of the constructive algorithms using these objective functions is still preserved. We also propose a few computational tricks that can be used to improve the optimization of the objective functions under practical situations. Their relative performance in a set of two-dimensional regression problems is also discussed.

[1]  Darrell Whitley,et al.  Prediction of software reliability using feedforward and recurrent neural nets , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[2]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[3]  Bartlomiej Beliczynski,et al.  Incremental approximation by one-hidden-layer neural networks: discrete functions rapprochement , 1996, Proceedings of IEEE International Symposium on Industrial Electronics.

[4]  A. H. Siddgi Functional analysis : with applications , 1986 .

[5]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[6]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[7]  Brian D. Ripley,et al.  Statistical Ideas for Selecting Network Architectures , 1995, SNN Symposium on Neural Networks.

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  Yoshio Hirose,et al.  Backpropagation algorithm which varies the number of hidden units , 1989, International 1989 Joint Conference on Neural Networks.

[10]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[11]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[12]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[13]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[14]  Dit-Yan Yeung,et al.  Use of bias term in projection pursuit learning improves approximation and convergence properties , 1996, IEEE Trans. Neural Networks.

[15]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[16]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[17]  D. Griffel Applied functional analysis , 1982 .

[18]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[19]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[20]  Guillaume Deffuant Neural units recruitment algorithm for generation of decision trees , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[21]  Terrence L. Fine,et al.  Forecasting Demand for Electric Power , 1992, NIPS.

[22]  E. Fiesler,et al.  Comparative Bibliography of Ontogenic Neural Networks , 1994 .

[23]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[24]  Vijay K. Rohatgi,et al.  Statistical Inference , 1984 .

[25]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[26]  Eduardo D. Sontag,et al.  Rate of approximation results motivated by robust neural network learning , 1993, COLT '93.

[27]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[28]  Henk Corporaal,et al.  Variations on the Cascade-Correlation Learning Architecture for Fast Convergence in Robot Control , 1992 .

[29]  J. Friedman Exploratory Projection Pursuit , 1987 .

[30]  Mike Mannion,et al.  Complex systems , 1997, Proceedings International Conference and Workshop on Engineering of Computer-Based Systems.

[31]  I. Johnstone,et al.  Projection-Based Approximation and a Duality with Kernel Methods , 1989 .

[32]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[33]  Avijit Saha,et al.  Approximation, Dimension Reduction, and Nonconvex Optimization Using Linear Superpositions of Gaussians , 1993, IEEE Trans. Computers.

[34]  Osamu Fujita,et al.  Optimization of the hidden unit function in feedforward neural networks , 1992, Neural Networks.

[35]  Christopher G. Atkeson,et al.  Some Approximation Properties of Projection Pursuit Learning Networks , 1991, NIPS.

[36]  L. Jones On a conjecture of Huber concerning the convergence of projection pursuit regression , 1987 .

[37]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[38]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[39]  Wj Fitzgerald,et al.  Optimization schemes for neural networks , 1993 .

[40]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[41]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[42]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[43]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[44]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[45]  Stephen J. McKenna,et al.  Cascade-correlation neural networks for the classification of cervical cells , 1992 .

[46]  Dit-Yan Yeung,et al.  Constructive neural networks: some practical considerations , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[47]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[48]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[49]  Pierre Courrieu A convergent generator of neural networks , 1993, Neural Networks.

[50]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[51]  Jerome H. Friedman,et al.  An Overview of Predictive Learning and Function Approximation , 1994 .

[52]  C. Jutten,et al.  Gal: Networks That Grow When They Learn and Shrink When They Forget , 1991 .

[53]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[54]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[55]  Jude Shavlik,et al.  EXPERIMENTAL ANALYSIS OF ASPECTS OF THE CASCADE-CORRELATION LEARNING ARCHITECTURE , 1991 .

[56]  Helge Ritter,et al.  Cascade LLM Networks , 1992 .

[57]  Hans Henrik Thodberg,et al.  A review of Bayesian neural networks with an application to near infrared spectroscopy , 1996, IEEE Trans. Neural Networks.

[58]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[59]  James C. Bezdek,et al.  A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain , 1992, IEEE Trans. Neural Networks.

[60]  Blake LeBaron,et al.  Evaluating Neural Network Predictors by Bootstrapping , 1994 .

[61]  S. K. Rogers,et al.  A taxonomy of neural network optimality , 1992, Proceedings of the IEEE 1992 National Aerospace and Electronics Conference@m_NAECON 1992.

[62]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[63]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[64]  H. Akaike A new look at the statistical model identification , 1974 .

[65]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[66]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[67]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[68]  Harry Wechsler,et al.  From Statistics to Neural Networks: Theory and Pattern Recognition Applications , 1996 .

[69]  Jooyoung Park,et al.  Approximation and Radial-Basis-Function Networks , 1993, Neural Computation.