Learning by on-line gradient descent

We study on-line gradient-descent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as defined by a target rule. In the thermodynamic limit we derive deterministic differential equations for the order parameters of the problem which allow an exact calculation of the evolution of the generalization error. First we consider a single-layer perceptron with sigmoidal activation function learning a target rule defined by a network of the same architecture. For this model the generalization error decays exponentially with the number of training examples if the learning rate is sufficiently small. However, if the learning rate is increased above a critical value, perfect learning is no longer possible. For architectures with hidden layers and fixed hidden-to-output weights, such as the parity and the committee machine, we find additional effects related to the existence of symmetries in these problems.

[1]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[2]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[3]  F. Vallet,et al.  Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions , 1989 .

[4]  Heskes,et al.  Learning processes in neural networks. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[5]  Boes,et al.  Statistical mechanics for networks of graded-response neurons. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[6]  D. Hansel,et al.  Memorization without generalization in a multilayered neural network , 1992 .

[7]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[8]  H. Schwarze Learning a rule in a multilayer neural network , 1993 .

[9]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[10]  Opper,et al.  Generalization ability of perceptrons with continuous outputs. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[11]  L. K. Hansen,et al.  Stochastic dynamics of supervised learning , 1993 .

[12]  Michael Biehl An Exactly Solvable Model of Unsupervised Learning , 1994 .

[13]  Opper,et al.  Learning and generalization in a two-layer neural network: The role of the Vapnik-Chervonvenkis dimension. , 1994, Physical review letters.

[14]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[15]  Y. Kabashima Perfect loss of generalization due to noise in K=2 parity machines , 1994 .