Gated Autoencoders with Tied Input Weights

The semantic interpretation of images is one of the core applications of deep learning. Several techniques have been recently proposed to model the relation between two images, with application to pose estimation, action recognition or invariant object recognition. Among these techniques, higher-order Boltzmann machines or relational autoencoders consider projections of the images on different subspaces and intermediate layers act as transformation specific detectors. In this work, we extend the mathematical study of (Memisevic, 2012b) to show that it is possible to use a unique projection for both images in a way that turns intermediate layers as spectrum encoders of transformations. We show that this results in networks that are easier to tune and have greater generalization capabilities.

[1]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[2]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[3]  Geoffrey E. Hinton,et al.  Modeling the joint density of two images under a variety of transformations , 2011, CVPR 2011.

[4]  Pierre Baldi,et al.  Complex-Valued Autoencoders , 2011, Neural Networks.

[5]  Matthias Bethge,et al.  Unsupervised learning of a steerable basis for invariant image representations , 2007, Electronic Imaging.

[6]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[8]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[9]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[10]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[11]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[12]  Geoffrey E. Hinton,et al.  Two Distributed-State Models For Generating High-Dimensional Time Series , 2011, J. Mach. Learn. Res..

[13]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[14]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[15]  Roland Memisevic Learning to relate images: Mapping units, complex cells and simultaneous eigenspaces , 2011, ArXiv.

[16]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[17]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[18]  Roland Memisevic,et al.  On multi-view feature learning , 2012, ICML.

[19]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.