Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines

To allow the hidden units of a restricted Boltzmann machine to model the transformation between two successive images, Memisevic and Hinton (2007) introduced three-way multiplicative interactions that use the intensity of a pixel in the first image as a multiplicative gain on a learned, symmetric weight between a pixel in the second image and a hidden unit. This creates cubically many parameters, which form a three-dimensional interaction tensor. We describe a low-rank approximation to this interaction tensor that uses a sum of factors, each of which is a three-way outer product. This approximation allows efficient learning of transformations between larger image patches. Since each factor can be viewed as an image filter, the model as a whole learns optimal filter pairs for efficiently representing transformations. We demonstrate the learning of optimal filter pairs from various synthetic and real image sequences. We also show how learning about image transformations allows the model to perform a simple visual analogy task, and we show how a completely unsupervised network trained on transformations perceives multiple motions of transparent dot patterns in the same way as humans.

[1]  Douglas Hofstadter,et al.  The Copycat Project: An Experiment in Nondeterminism and Creative Analogies , 1984 .

[2]  T. Sejnowski Higher‐order Boltzmann machines , 1987 .

[3]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[4]  Patrice Y. Simard,et al.  An efficient algorithm for learning invariance in adaptive classifiers , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[5]  B. Olshausen Neural routing circuits for forming invariant representations of visual objects , 1994 .

[6]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[7]  Rajesh P. N. Rao,et al.  Efficient Encoding of Natural Time Varying Images Produces Oriented Space-Time Receptive Fields , 1997 .

[8]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[9]  Rajesh P. N. Rao,et al.  Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[10]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal ®lters similar to simple cells in primary visual cortex , 1998 .

[11]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[12]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[13]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[14]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[15]  Rajesh P. N. Rao,et al.  Bilinear Sparse Coding for Invariant Vision , 2005, Neural Computation.

[16]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[18]  Bruno A. Olshausen,et al.  Bilinear models of natural images , 2007, Electronic Imaging.

[19]  Rajesh P. N. Rao,et al.  Learning the Lie Groups of Visual Invariance , 2007, Neural Computation.

[20]  Roland Memisevic Non-linear latent factor models for revealing structure in high-dimensional data , 2008 .

[21]  Ruslan Salakhutdinov,et al.  Learning deep generative models , 2009 .

[22]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Christoph von der Malsburg,et al.  Self-Organization of Topographic Bilinear Networks for Invariant Recognition , 2011, Neural Computation.