It is possible to combine multiple probabilistic models of the same data by multiplying the probabilities together and then renormalizing. This is a very efficient way to model high-dimensional data which simultaneously satisfies many different low dimensional constraints. Each individual expert model can focus on giving high probability to data vectors that satisfy just one of the constraints. Data vectors that satisfy this one constraint but violate other constraints will be ruled out by their low probability under the other expert models. Training a product of models appears difficult because, in addition to maximizing the probabilities that the individual models assign to the observed data, it is necessary to make the models disagree on unobserved regions of the data space. However, if the individual models are tractable there is a fairly efficient way to train a product of models. This training algorithm suggests a biologically plausible way of learning neural population codes.
[1]
M. Turk,et al.
Eigenfaces for Recognition
,
1991,
Journal of Cognitive Neuroscience.
[2]
Pierre Comon,et al.
Independent component analysis, A new concept?
,
1994,
Signal Process..
[3]
Terrence J. Sejnowski,et al.
An Information-Maximization Approach to Blind Separation and Blind Deconvolution
,
1995,
Neural Computation.
[4]
Hagai Attias,et al.
Independent Factor Analysis
,
1999,
Neural Computation.
[5]
Geoffrey E. Hinton,et al.
Recognizing Hand-written Digits Using Hierarchical Products of Experts
,
2002,
NIPS.
[6]
Terrence J. Sejnowski,et al.
Learning Overcomplete Representations
,
2000,
Neural Computation.
[7]
Yee Whye Teh,et al.
Rate-coded Restricted Boltzmann Machines for Face Recognition
,
2000,
NIPS.
[8]
Geoffrey E. Hinton,et al.
Products of Hidden Markov Models
,
2001,
AISTATS.