Neural Network Modeling for Small Datasets

Neural network modeling for small datasets can be justified from a theoretical point of view according to some of Bartlett's results showing that the generalization performance of a multilayer perceptron (MLP) depends more on the L1 norm ‖c‖1 of the weights between the hidden layer and the output layer rather than on the total number of weights. In this article we investigate some geometrical properties of MLPs and drawing on linear projection theory, we propose an equivalent number of degrees of freedom to be used in neural model selection criteria like the Akaike information criterion and the Bayes information criterion and in the unbiased estimation of the error variance. This measure proves to be much smaller than the total number of parameters of the network usually adopted, and it does not depend on the number of input variables. Moreover, this concept is compatible with Bartlett's results and with similar ideas long associated with projection-based models and kernel models. Some numerical studies involving both real and simulated datasets are presented and discussed.

[1]  Isabella Morlini,et al.  Modelli Neuronali per piccoli insiemi di dati , 2002 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[4]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[5]  Richard D. De Veaux,et al.  [Neural Networks in Applied Statistics]: Discussion , 1996 .

[6]  Terrence L. Fine,et al.  Feedforward Neural Network Methodology , 1999, Information Science and Statistics.

[7]  S. Ingrassia,et al.  ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS , 2004 .

[8]  N. Lazar,et al.  Methods and Criteria for Model Selection , 2004 .

[9]  C. Lee Giles,et al.  What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .

[10]  R. D. Veaux,et al.  Prediction intervals for neural networks via nonlinear regression , 1998 .

[11]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Duker Discussion I , 1993, Ophthalmology.

[15]  Richard D. De Veaux,et al.  Multicollinearity: A tale of two nonparametric regressions , 1994 .

[16]  Hal S. Stern,et al.  Neural networks in applied statistics , 1996 .

[17]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[18]  H. Akaike Statistical predictor identification , 1970 .

[19]  Lyle H. Ungar,et al.  A comparison of two nonparametric estimation schemes: MARS and neural networks , 1993 .

[20]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[24]  J. Hodges,et al.  Counting degrees of freedom in hierarchical and other richly-parameterised models , 2001 .

[25]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[26]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[27]  Salvatore Ingrassia,et al.  Geometrical Aspects of Discrimination by Multilayer Perceptrons , 1999 .

[28]  Robert Azencott,et al.  Synchronous Boltzmann machines and curve identification tasks , 1993 .

[29]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[30]  Terrence L. Fine,et al.  Neural-network design for small training sets of high dimension , 1998, IEEE Trans. Neural Networks.