论文信息 - Statistical Aspects of Generalization in Neural Networks

Statistical Aspects of Generalization in Neural Networks

The topic of this thesis is generalization in feed forward neural networks. The number of parameters in a model and the generalization ability of the model are tightly coupled entities. Neural networks consist usually of a large number of parameters, and the problem of nding the optimal number of parameters in these functions (networks) can be addressed in two levels of `precision': a) as a function of the network degree (the number of internal nodes), or b) as a function of the individual weights (pruning). Both levels are investigated in this thesis, but emphasis is put on pruning seen in the Minimum Descriptive Length (MDL) perspective. There are primarily 2 distinct results: a) Under certain assumptions pruning using a MDL measure can be shown to be equivalent to pruning with Optimal Brain Damage, b) Analysis of the Sunspot series with MDL-pruning shows better generalization than previously [25], and in some cases much better prediction of the future, but this is de nitely not at the point of MDL. This thesis does also contain a introduction to Valiant's Probably Approximately Correct (PAC) learning [26,1,6], Rissanen's Minimum Description Length (MDL) [22,15, 14], and Le Cun's et al. Optimal Brain Damage (OBD) [8,25].

Jon Sporring | J. Sporring

[1] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[2] H. Akaike. Fitting autoregressive models for prediction , 1969 .

[3] Yaser S. Abu-Mostafa,et al. The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[4] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[5] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[6] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7] Lars Kai Hansen,et al. On design and evaluation of tapped-delay neural network architectures , 1993, IEEE International Conference on Neural Networks.

[8] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[9] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[10] James A. Anderson,et al. Neurocomputing: Foundations of Research , 1988 .

[11] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[12] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[13] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[14] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[15] Jukka Saarinen,et al. Chaotic time series modeling with optimum neural network architecture , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[16] Jukka Saarinen,et al. Neural Network Modeling and Prediction of Multivariate Time Series Using Predictive MDL Principle , 1993 .

[17] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.