Functional optimization of online algorithms in multilayer neural networks

We study the online dynamics of learning in fully connected soft committee machines in the student - teacher scenario. The locally optimal modulation function, which determines the learning algorithm, is obtained from a variational argument in such a manner as to maximize the average generalization error decay per example. Simulations results for the resulting algorithm are presented for a few cases. The symmetric phase plateaux are found to be vastly reduced in comparison to those found when online backpropagation algorithms are used. A discussion of the implementation of these ideas as practical algorithms is given.

[1]  Manfred Opper,et al.  Statistical mechanics of generalization , 1998 .

[2]  M. Opper,et al.  5 Statistical Mechanics of Generalization , .

[3]  David Saad,et al.  On-line learning with adaptive back-propagation in two-layer networks , 1997 .

[4]  Saad,et al.  On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[5]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[6]  Michael Biehl,et al.  On-Line Learning with a Perceptron , 1994 .

[7]  Reimann,et al.  Unsupervised learning by examples: On-line versus off-line. , 1996, Physical review letters.

[8]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[9]  Michael Biehl,et al.  Transient dynamics of on-line learning in two-layered neural networks , 1996 .

[10]  O. Kinouchi,et al.  Optimal generalization in perceptions , 1992 .

[11]  O. Kinouchi,et al.  Lower bounds on generalization errors for drifting rules , 1993 .

[12]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[13]  Michael Biehl,et al.  On-Line Learning of a Time-Dependent Rule , 1992 .

[14]  Michael Biehl,et al.  Noise robustness in multilayer neural networks , 1997 .

[15]  W. Kinzel Physics of Neural Networks , 1990 .

[16]  O. Kinouchi,et al.  Learning algorithm that gives the Bayes generalization limit for perceptrons. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[17]  N. Caticha,et al.  On-line learning in parity machines , 1996 .

[18]  Magnus Rattray,et al.  Globally optimal parameters for on-line learning in multilayer neural networks , 1997 .

[19]  Michael Biehl,et al.  Learning by on-line gradient descent , 1995 .

[20]  N. Caticha,et al.  On-line learning in the committee machine , 1995 .

[21]  Opper On-line versus Off-line Learning from Random Examples: General Results. , 1996, Physical review letters.