论文信息 - Transformation Properties of Learned Visual Representations

Transformation Properties of Learned Visual Representations

When a three-dimensional object moves relative to an observer, a change occurs on the observer's image plane and in the visual representation computed by a learned model. Starting with the idea that a good visual representation is one that transforms linearly under scene motions, we show, using the theory of group representations, that any such representation is equivalent to a combination of the elementary irreducible representations. We derive a striking relationship between irreducibility and the statistical dependency structure of the representation, by showing that under restricted conditions, irreducible representations are decorrelated. Under partial observability, as induced by the perspective projection of a scene onto the image plane, the motion group does not have a linear action on the space of images, so that it becomes necessary to perform inference over a latent representation that does transform linearly. This idea is demonstrated in a model of rotating NORB objects that employs a latent representation of the non-commutative 3D rotation group SO(3).

Max Welling | Taco S. Cohen | M. Welling | Taco Cohen

[1] E. Wigner,et al. Book Reviews: Group Theory. And Its Application to the Quantum Mechanics of Atomic Spectra , 1959 .

[2] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] M. Sugiura. Unitary Representations and Harmonic Analysis , 1990 .

[4] 金谷健一. Group-theoretical methods in image understanding , 1990 .

[5] Rajesh P. N. Rao,et al. Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[6] Szymon Rusinkiewicz,et al. Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[7] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8] Rajesh P. N. Rao,et al. Learning the Lie Groups of Visual Invariance , 2007, Neural Computation.

[9] D. Pinchon,et al. Rotation matrices for real spherical harmonics: general rotations of atomic orbitals in space-fixed axes , 2007 .

[10] Qing Wang,et al. Fast computation of 3D spherical Fourier harmonic descriptors - a complete orthonormal basis for a rotational invariant representation of three-dimensional objects , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[11] Bruno A. Olshausen,et al. An Unsupervised Algorithm For Learning Lie Group Transformations , 2010, ArXiv.

[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[14] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[15] Stéphane Mallat,et al. Group Invariant Scattering , 2011, ArXiv.

[16] Stefano Soatto,et al. Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control , 2011, ArXiv.

[17] Bruno A. Olshausen,et al. Lie Group Transformation Models for Predictive Video Coding , 2011, 2011 Data Compression Conference.

[18] Roland Memisevic,et al. On multi-view feature learning , 2012, ICML.

[19] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.