Learning to Relate Images

A fundamental operation in many vision tasks, including motion understanding, stereopsis, visual odometry, or invariant recognition, is establishing correspondences between images or between images and data from other modalities. Recently, there has been increasing interest in learning to infer correspondences from data using relational, spatiotemporal, and bilinear variants of deep learning methods. These methods use multiplicative interactions between pixels or between features to represent correlation patterns across multiple images. In this paper, we review the recent work on relational feature learning, and we provide an analysis of the role that multiplicative interactions play in learning to encode relations. We also discuss how square-pooling and complex cell models can be viewed as a way to represent multiplicative interactions and thereby as a way to encode relations.

[1]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[2]  Douglas Hofstadter,et al.  The Copycat Project: An Experiment in Nondeterminism and Creative Analogies , 1984 .

[3]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[6]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[7]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[9]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[10]  I. Ohzawa,et al.  Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. , 1990, Science.

[11]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[12]  Tony Plate,et al.  Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations , 1991, IJCAI.

[13]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[14]  D. V. van Essen,et al.  Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. , 1993, Science.

[15]  Ning Qian,et al.  Computing Stereo Disparity and Motion with Known Binocular Cell Properties , 1994, Neural Computation.

[16]  B. Olshausen Neural routing circuits for forming invariant representations of visual objects , 1994 .

[17]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[18]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.

[19]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[20]  David J. Fleet,et al.  Neural encoding of binocular disparity: Energy models, position shifts and phase shifts , 1996, Vision Research.

[21]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[22]  Bartlett W. Mel,et al.  A model for intradendritic computation of binocular disparity , 2000, Nature Neuroscience.

[23]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[24]  Aapo Hyvärinen,et al.  Topographic ICA as a Model of Natural Image Statistics , 2000, Biologically Motivated Computer Vision.

[25]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[26]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[27]  Geoffrey E. Hinton,et al.  Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[28]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[29]  A. Hyvärinen,et al.  A multi-layer sparse coding network learns contour coding from natural images , 2002, Vision Research.

[30]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[31]  T. Sanger,et al.  Stereo disparity computation using Gabor filters , 1988, Biological Cybernetics.

[32]  Rajesh P. N. Rao,et al.  Bilinear Sparse Coding for Invariant Vision , 2005, Neural Computation.

[33]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  C. Zetzsche,et al.  Nonlinear and higher-order approaches to the encoding of natural scenes , 2005, Network.

[35]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[36]  Michael S. Lewicki,et al.  Is Early Vision Optimized for Extracting Higher-order Dependencies? , 2005, NIPS.

[37]  Richard S. Zemel,et al.  Combining discriminative features to infer complex trajectories , 2006, ICML.

[38]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[39]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Matthias Bethge,et al.  Unsupervised learning of a steerable basis for invariant image representations , 2007, Electronic Imaging.

[41]  Bruno A. Olshausen,et al.  Bilinear models of natural images , 2007, Electronic Imaging.

[42]  Roland Memisevic,et al.  Non-linear Latent Factor Models for Revealing Structure in High-dimensional Data , 2008 .

[43]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[44]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[45]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[46]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[47]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[48]  Geoffrey E. Hinton,et al.  Gated Softmax Classification , 2010, NIPS.

[49]  David J. Fleet,et al.  Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[51]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[53]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[54]  Geoffrey E. Hinton,et al.  Modeling the joint density of two images under a variety of transformations , 2011, CVPR 2011.

[55]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[56]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[57]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[58]  Roland Memisevic,et al.  Gradient-based learning of higher-order image features , 2011, 2011 International Conference on Computer Vision.

[59]  Kai Yu,et al.  Deep Learning of invariant features via tracked video sequences , 2012, NIPS 2012.

[60]  Roland Memisevic,et al.  On multi-view feature learning , 2012, ICML.

[61]  Geoffrey E. Hinton,et al.  Robust Boltzmann Machines for recognition and denoising , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[63]  Bruno A. Olshausen,et al.  Learning Intermediate-Level Representations of Form and Motion from Natural Movies , 2012, Neural Computation.

[64]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.