Data Driven Multiple Neural Network Models Generator Based on a Tree-like Scheduler

In this paper we describe a new penalty-based model selection criterion for nonlinear models which is based on the influence of the noise in the fitting. According to Occam's razor we should seek simpler models over complex ones and optimize the trade-off between model complexity and the accuracy of a model's description to the training data. An empirical derivation is developed and computer simulations for multilayer perceptron with weight decay regularization are made in order to show the efficiency and robustness of the method in comparison with other well-known criteria for nonlinear systems.

[1]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[2]  Elisa Guerrero Vázquez,et al.  Noise derived information criterion for model selection , 2002, ESANN.

[3]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[4]  Gerald Sommer,et al.  Dynamic Cell Structure Learns Perfectly Topology Preserving Map , 1995, Neural Computation.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[7]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[8]  Eric B. Bartlett,et al.  An information theoretic approach for combining neural network process models , 1999, Neural Networks.

[9]  M. Stone Cross-validation:a review 2 , 1978 .

[10]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[11]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[12]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[13]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[14]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  H. Deutsch Principle Component Analysis , 2004 .

[17]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[18]  Achilleas Zapranis,et al.  Principles of Neural Model Identification, Selection and Adequacy: With Applications to Financial Econometrics , 1999 .

[19]  Gerald Sommer,et al.  Dynamic Cell Structures , 1994, NIPS.

[20]  H. Akaike Statistical predictor identification , 1970 .

[21]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[22]  Partha Niyogi,et al.  Active Learning for Function Approximation , 1994, NIPS.