Improving generalization in backpropagation networks with distributed bottlenecks

The primary goal of any adaptive system that learns by example is to generalize from the training examples to novel inputs. The backpropagation learning algorithm is popular for its simplicity and landmark cases of generalization. It has been observed that backpropagation networks sometimes generalize better when they contain a hidden layer that has considerably fewer units than previous layers. The functional properties of such hidden-layer bottlenecks are analyzed, and a method for dynamically creating them, concurrent with backpropagation learning, is described. The method does not excise hidden units; rather, it compresses the dimensionality of the space spanned by the hidden-unit weight vectors and forms clusters of weight vectors in the low-dimensional space. The result is a functional bottleneck distributed across many units. The method is a gradient descent procedure, using local computations on simple lateral Hebbian connections between hidden units.<<ETX>>

[1]  Eric Saund Abstraction and Representation of Continuous Variables in Connectionist Networks , 1986, AAAI.

[2]  R. Shepard,et al.  Monotone mapping of similarities into a general metric space , 1974 .

[3]  P. A. Sandon,et al.  A local interaction heuristic for adaptive networks , 1988, IEEE 1988 International Conference on Neural Networks.

[4]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[5]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[6]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[7]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[8]  Javier R. Movellan,et al.  Benefits of gain: speeded learning and minimal hidden layers in back-propagation networks , 1991, IEEE Trans. Syst. Man Cybern..

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  Bernard Widrow,et al.  Neural nets for adaptive filtering and adaptive pattern recognition , 1988, Computer.

[11]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[12]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[13]  D. Psaltis,et al.  The emergence of generalization in networks with constrained representations , 1988, IEEE 1988 International Conference on Neural Networks.

[14]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[15]  R.J.F. Dow,et al.  Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[16]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[17]  S. Y. Kung,et al.  An algebraic projection analysis for optimal hidden units size and learning rates in back-propagation learning , 1988, IEEE 1988 International Conference on Neural Networks.