A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks

The universal asymptotic scaling laws proposed by Amari et al. are studied in large scale simulations using a CM5. Small stochastic multilayer feedforward networks trained with backpropagation are investigated. In the range of a large number of training patterns t, the asymptotic generalization error scales as 1/t as predicted. For a medium range t a faster 1/t2 scaling is observed. This effect is explained by using higher order corrections of the likelihood expansion. It is shown for small t that the scaling law changes drastically, when the network undergoes a transition from strong overfitting to effective learning.

[1]  K. Takeuchi,et al.  Asymptotic efficiency of statistical estimators : concepts and higher order asymptotic efficiency , 1981 .

[2]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[3]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[4]  M. Opper,et al.  On the ability of the optimal perceptron to generalise , 1990 .

[5]  Sompolinsky,et al.  Learning from examples in large neural networks. , 1990, Physical review letters.

[6]  Heskes,et al.  Learning processes in neural networks. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[7]  David Haussler,et al.  Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise , 1991, COLT '91.

[8]  Hansel,et al.  Broken symmetries in multilayered perceptrons. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[9]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[10]  Shun-ichi Amari,et al.  Learning Curves, Model Selection and Complexity of Neural Networks , 1992, NIPS.

[11]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[12]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[13]  H. Schwarze,et al.  Generalization in Fully Connected Committee Machines , 1993 .

[14]  Oh,et al.  Generalization in a two-layer neural network. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Michael Finke,et al.  Estimating A-Posteriori Probabilities using Stochastic Network Models , 1993 .

[16]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1994, COLT '94.

[17]  David Haussler,et al.  Rigorous Learning Curve Bounds from Statistical Mechanics , 1994, COLT.

[18]  Klaus-Robert Müller,et al.  Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective? , 1995, NIPS.

[19]  Saad,et al.  On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[20]  Dana Ron,et al.  An experimental and theoretical comparison of model selection methods , 1995, COLT '95.

[21]  Saad,et al.  Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.

[22]  F. Komaki On asymptotic properties of predictive distributions , 1996 .

[23]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[24]  Manfred Opper,et al.  Statistical mechanics of generalization , 1998 .

[25]  S. Amari,et al.  LARGE SCALE SIMULATIONS FOR LEARNING CURVES , 2022 .