Feature grouping from spatially constrained multiplicative interaction

We present a feature learning model that learns to encode relationships between images. The model is defined as a Gated Boltzmann Machine, which is constrained such that hidden units that are nearby in space can gate each other's connections. We show how frequency/orientation "columns" as well as topographic filter maps follow naturally from training the model on image pairs. The model also helps explain why square-pooling models yield feature groups with similar grouping properties. Experimental results on synthetic image transformations show that spatially constrained gating is an effective way to reduce the number of parameters and thereby to regularize a transformation-learning model.

[1]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[3]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[4]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[7]  Charles F. Stevens,et al.  A Universal Design Principle for Visual System Pinwheels , 2011, Brain, Behavior and Evolution.

[8]  David J. Fleet,et al.  Neural encoding of binocular disparity: Energy models, position shifts and phase shifts , 1996, Vision Research.

[9]  Bartlett W. Mel,et al.  Toward a Single-Cell Account for Binocular Disparity Tuning: An Energy Model May Be Hiding in Your Dendrites , 1997, NIPS.

[10]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[11]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[13]  Roland Memisevic,et al.  Gradient-based learning of higher-order image features , 2011, 2011 International Conference on Computer Vision.

[14]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[15]  Aapo Hyvärinen,et al.  Topographic ICA as a Model of Natural Image Statistics , 2000, Biologically Motivated Computer Vision.

[16]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[18]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[19]  I. Ohzawa,et al.  Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. , 1990, Science.

[20]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[21]  Roland Memisevic,et al.  On multi-view feature learning , 2012, ICML.

[22]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[23]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[24]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.