Statistical Aspects of Generalization in Neural Networks

The topic of this thesis is generalization in feed forward neural networks. The number of parameters in a model and the generalization ability of the model are tightly coupled entities. Neural networks consist usually of a large number of parameters, and the problem of nding the optimal number of parameters in these functions (networks) can be addressed in two levels of `precision': a) as a function of the network degree (the number of internal nodes), or b) as a function of the individual weights (pruning). Both levels are investigated in this thesis, but emphasis is put on pruning seen in the Minimum Descriptive Length (MDL) perspective. There are primarily 2 distinct results: a) Under certain assumptions pruning using a MDL measure can be shown to be equivalent to pruning with Optimal Brain Damage, b) Analysis of the Sunspot series with MDL-pruning shows better generalization than previously [25], and in some cases much better prediction of the future, but this is de nitely not at the point of MDL. This thesis does also contain a introduction to Valiant's Probably Approximately Correct (PAC) learning [26,1,6], Rissanen's Minimum Description Length (MDL) [22,15, 14], and Le Cun's et al. Optimal Brain Damage (OBD) [8,25].

[1]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[2]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[3]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[4]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7]  Lars Kai Hansen,et al.  On design and evaluation of tapped-delay neural network architectures , 1993, IEEE International Conference on Neural Networks.

[8]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[11]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[12]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[13]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[14]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[15]  Jukka Saarinen,et al.  Chaotic time series modeling with optimum neural network architecture , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[16]  Jukka Saarinen,et al.  Neural Network Modeling and Prediction of Multivariate Time Series Using Predictive MDL Principle , 1993 .

[17]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.