Redundancy Reduction as a Strategy for Unsupervised Learning

A redundancy reduction strategy, which can be applied in stages, is proposed as a way to learn as efficiently as possible the statistical properties of an ensemble of sensory messages. The method works best for inputs consisting of strongly correlated groups, that is features, with weaker statistical dependence between different features. This is the case for localized objects in an image or for words in a text. A local feature measure determining how much a single feature reduces the total redundancy is derived which turns out to depend only on the probability of the feature and of its components, but not on the statistical properties of any other features. The locality of this measure makes it ideal as the basis for a "neural" implementation of redundancy reduction, and an example of a very simple non-Hebbian algorithm is given. The effect of noise on learning redundancy is also discussed.

[1]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .

[2]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[3]  A. M. Uttley,et al.  Information transmission in the nervous system , 1979 .

[4]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[6]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[8]  Ralph Linsker,et al.  An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.

[9]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[10]  Joseph J. Atick,et al.  Predicting Ganglion and Simple Cell Receptive Field Organizations , 1991, Int. J. Neural Syst..

[11]  Colin Blakemore,et al.  Statistical limits to image understanding , 1991 .

[12]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[13]  Joseph J. Atick,et al.  Convergent Algorithm for Sensory Receptive Field Development , 1993, Neural Computation.

[14]  A. Norman Redlich,et al.  Supervised Factorial Learning , 1993, Neural Computation.

[15]  Gustavo Deco,et al.  Unsupervised Mutual Information Criterion for Elimination of Overtraining in Supervised Multilayer Networks , 1995, Neural Computation.

[16]  Gustavo Deco,et al.  Nonlinear higher-order statistical decorrelation by volume-conserving neural architectures , 1995, Neural Networks.

[17]  Gustavo Deco,et al.  Linear redundancy reduction learning , 1995, Neural Networks.

[18]  Deco,et al.  Learning time series evolution by unsupervised extraction of correlations. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.