I propose a novel general principle for unsupervised learning of distributed non-redundant internal representations of input patterns. The principle is based on two opposing forces. For each represen-tational unit there is an adaptive predictor which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to lteràbstract concepts' out of the environmental input such that these concepts are statistically independent of those upon which the other units focus. I discuss various simple yet potentially powerful implementations of the principle which aim at nding binary factorial codes (Bar-low et al., 1989), i.e. codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, (3) novelty detection. Methods for nding factorial codes automatically implement Occam's razor for nding codes using a minimal number of units. Unlike previous methods the novel principle has a potential for removing not only linear but also non-linear output redundancy. Illustrative experiments show that algorithms based on the principle of predictability minimization are practically feasible. The nal part of this paper describes an entirely local algorithm that has a potential for learning unique representations of extended input sequences.
[1]
P. Werbos,et al.
Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences
,
1974
.
[2]
Barak A. Pearlmutter,et al.
G-maximization: An unsupervised learning procedure for discovering regularities
,
1987
.
[3]
Ralph Linsker,et al.
Self-organization in a perceptual network
,
1988,
Computer.
[4]
Terence D. Sanger,et al.
An Optimality Principle for Unsupervised Learning
,
1988,
NIPS.
[5]
Erkki Oja,et al.
Neural Networks, Principal Components, and Subspaces
,
1989,
Int. J. Neural Syst..
[6]
H. B. Barlow,et al.
Finding Minimum Entropy Codes
,
1989,
Neural Computation.
[7]
J. Rubner,et al.
Development of feature detectors by self-organization. A network model.
,
1990,
Biological cybernetics.
[8]
Geoffrey E. Hinton,et al.
Discovering Viewpoint-Invariant Relationships That Characterize Objects
,
1990,
NIPS.
[9]
Suzanna Becker,et al.
Unsupervised Learning Procedures for Neural Networks
,
1991,
Int. J. Neural Syst..
[10]
Jürgen Schmidhuber,et al.
Learning Unambiguous Reduced Sequence Descriptions
,
1991,
NIPS.
[11]
Jürgen Schmidhuber,et al.
Learning Complex, Extended Sequences Using the Principle of History Compression
,
1992,
Neural Computation.