Dynamics of batch training in a perceptron

Early stopping and weight decay are studied in a linear perceptron using a new simplified approach to the dynamics in the thermodynamical limit. The approach is directly deduced from the gradient descent weight update. It allows an exact description of the dynamics of the batch training process. The results are compared with a recent study of early stopping and weight decay based on the equilibrium statistical mechanics approach. It is shown that the equilibrium results for early stopping are good approximations and are exact for weight decay. Furthermore, in the dynamical approach it is possible to determine the necessary number of training steps to fulfil certain termination conditions. It can be shown that asymptotically, i.e. if the number of examples is large, only two batch steps are required to reach the optimal convergence, if the learning rate is optimally chosen.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  M. Opper Learning in Neural Networks: Solvable Dynamics , 1989 .

[3]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[4]  O. Kinouchi,et al.  Optimal generalization in perceptions , 1992 .

[5]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[6]  A. Krogh Learning with noise in a linear perceptron , 1992 .

[7]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[8]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[9]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[10]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .

[11]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[12]  Michael Biehl,et al.  Learning drifting concepts with neural networks , 1993 .

[13]  Opper,et al.  Generalization ability of perceptrons with continuous outputs. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[14]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[15]  David Saad,et al.  Statistical mechanics of hypothesis evaluation , 1994 .

[16]  David Barber,et al.  Test Error Fluctuations in Finite Linear Perceptrons , 1995, Neural Computation.

[17]  Opper On-line versus Off-line Learning from Random Examples: General Results. , 1996, Physical review letters.

[18]  Reimann,et al.  Unsupervised learning by examples: On-line versus off-line. , 1996, Physical review letters.

[19]  Michael Biehl,et al.  Transient dynamics of on-line learning in two-layered neural networks , 1996 .

[20]  Nestor Caticha,et al.  Functional optimization of online algorithms in multilayer neural networks , 1997 .

[21]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[22]  S. Bös STATISTICAL MECHANICS APPROACH TO EARLY STOPPING AND WEIGHT DECAY , 1998 .