Benefits of gain: speeded learning and minimal hidden layers in back-propagation networks

The gain of a mode in a connectionist network is a multiplicative constant that amplifies or attenuates the net input to the node. The benefits of adaptive gains in back-propagation networks are explored. It is shown that gradient descent with respect to gain greatly increases learning speed by amplifying those directions in weight space that are successfully chosen by gradient descent on weights. Adaptive gains also allow normalization of weight vectors without loss of computational capacity, and the authors suggest a simple modification of the learning rule that automatically achieves weight normalization. A method for creating small hidden layers by making hidden node gains compete according to similarities between nodes in an effect to improve generalization performance is described. Simulations show that this competition method is more effective than the special case of gain decay. >

[1]  Thomas Cover Learning and generalization , 1991, COLT '91.

[2]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[5]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[6]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[7]  D. E. Rumelhart,et al.  Learning internal representations by back-propagating errors , 1986 .

[8]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[9]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[10]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[11]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[12]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[13]  J. Pettigrew,et al.  Depletion of brain catecholamines: failure of ocular dominance shift after monocular occlusion in kittens. , 1976, Science.

[14]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[15]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[16]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[17]  P. A. Sandon,et al.  A local interaction heuristic for adaptive networks , 1988, IEEE 1988 International Conference on Neural Networks.

[18]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[19]  J D Cohen,et al.  A network model of catecholamine effects: gain, signal-to-noise ratio, and behavior. , 1990, Science.

[20]  Carsten Peterson,et al.  Explorations of the mean field theory learning algorithm , 1989, Neural Networks.

[21]  J. K. Kruschke,et al.  Improving generalization in backpropagation networks with distributed bottlenecks , 1989, International 1989 Joint Conference on Neural Networks.

[22]  Jared Leinbach Automatic Local Annealing , 1988, NIPS.

[23]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[24]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[25]  D. Psaltis,et al.  The emergence of generalization in networks with constrained representations , 1988, IEEE 1988 International Conference on Neural Networks.

[26]  Steven J. Nowlan,et al.  Gain Variation in Recurrent Error Propagation Networks , 1988, Complex Syst..

[27]  Raoul Tawel Does the Neuron "Learn" Like the Synapse? , 1988, NIPS.

[28]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[29]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[30]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[31]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[32]  John K. Kruschke,et al.  Distributed bottlenecks for improved generalization in back-propagation networks , 1989 .

[33]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[34]  R.J.F. Dow,et al.  Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[35]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Geoffrey E. Hinton Deterministic Boltzmann Learning Performs Steepest Descent in Weight-Space , 1989, Neural Computation.

[37]  Mark A. Gluck,et al.  Constraints on Adaptive Networks for Modeling Human Generalization , 1988, NIPS.