Capacity of feedforward networks with shared weights

In pattern recognition it is a well-known fact that the number of free parameters of a classification function should not be too large, since the parameters have to be estimated from a finite learning set. For multi-layer feedforward network classifiers, this implies that the number of weights and units should be limited. However, a fundamentally different approach to decrease the number of free parameters in such networks, suggested by Rumelhart and applied by le Cun, is by sharing the same weights with multiple units. This was motivated by the fact that translation invariance could be obtained by this technique. In this paper, we discuss how this weight sharing technique influences the capacity or Vapnik-Chervonenkis dimension of the network. First, an upper bound is derived for the number of dichotomies that can be induced with a layer of units with shared weights. Then, we apply this result to bound the capacity of a simple class of weight-sharing networks. The results show that the capacity of a network with shared weights is still linear in the number of free parameters. Another remarkable outcome is either that the weight sharing technique is a very effective way of decreasing the capacity of a network, or that the existing bounds for the capacity of multi- layer feedforward networks considerably overestimate the capacity.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[3]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[4]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[5]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[7]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.