Harmonic Networks: Deep Translation and Rotation Equivariance

Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient and fixed computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.

[1]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[2]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qing Wang,et al.  2D/3D rotation-invariant detection using equivariant filters and kernel weighted mapping , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[5]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Stefan Roth,et al.  Learning rotation-aware features: From invariant priors to equivariant descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[8]  Joachim M. Buhmann,et al.  TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tyng-Luh Liu,et al.  Pixel-wise Deep Learning for Contour Detection , 2015, ICLR.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[13]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[14]  Andrea Vedaldi,et al.  Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[15]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[16]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Christopher K. I. Williams,et al.  Visual Boundary Prediction: A Deep Neural Prediction Network and Quality Dissection , 2014, AISTATS.

[18]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Beat Fasel,et al.  Rotation-Invariant Neoperceptron , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  Michele Volpi,et al.  Learning rotation invariant convolutional filters for texture classification , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[22]  Iasonas Kokkinos,et al.  Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[23]  Victor S. Lempitsky,et al.  N4-Fields: Neural Network Nearest Neighbor Fields for Image Transforms , 2014, ArXiv.

[24]  Yan Wang,et al.  DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[30]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[31]  Jianbo Shi,et al.  DeepEdge: A multi-scale bifurcated deep network for top-down contour detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Stefano Soatto,et al.  Actionable information in vision , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[34]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.