Dynamic Steerable Blocks in Deep Residual Networks

Filters in convolutional networks are typically parameterized in a pixel basis, that does not take prior knowledge about the visual world into account. We investigate the generalized notion of frames designed with image properties in mind, as alternatives to this parametrization. We show that frame-based ResNets and Densenets can improve performance on Cifar-10+ consistently, while having additional pleasant properties like steerability. By exploiting these transformation properties explicitly, we arrive at dynamic steerable blocks. They are an extension of residual blocks, that are able to seamlessly transform filters under pre-defined transformations, conditioned on the input at training and inference time. Dynamic steerable blocks learn the degree of invariance from data and locally adapt filters, allowing them to apply a different geometrical variant of the same filter to each location of the feature map. When evaluated on the Berkeley Segmentation contour detection dataset, our approach outperforms all competing approaches that do not utilize pre-training. Our results highlight the benefits of image-based regularization to deep networks.

[1]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[2]  Gerald Sommer,et al.  A Lie group approach to steerable filters , 1995, Pattern Recognit. Lett..

[3]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[4]  Anton van den Hengel,et al.  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[5]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[6]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[9]  O. Christensen An introduction to frames and Riesz bases , 2002 .

[10]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Max A. Viergever,et al.  Scale and the differential structure of images , 1992, Image Vis. Comput..

[14]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[15]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Iasonas Kokkinos,et al.  Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[18]  Jürgen Schmidhuber,et al.  Highway and Residual Networks learn Unrolled Iterative Estimation , 2016, ICLR.

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[21]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[24]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  I. Daubechies,et al.  Framelets: MRA-based constructions of wavelet frames☆☆☆ , 2003 .

[26]  Yacov Hel-Or,et al.  Canonical Decomposition of Steerable Functions , 2004, Journal of Mathematical Imaging and Vision.

[27]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[30]  Christopher K. I. Williams,et al.  Visual Boundary Prediction: A Deep Neural Prediction Network and Quality Dissection , 2014, AISTATS.

[31]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[32]  Michael Unser,et al.  A Unifying Parametric Framework for 2D Steerable Wavelet Transforms , 2013, SIAM J. Imaging Sci..

[33]  Jasper Snoek,et al.  Spectral Representations for Convolutional Neural Networks , 2015, NIPS.

[34]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[35]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[36]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[37]  Mario Fritz,et al.  On the Significance of Real-World Conditions for Material Classification , 2004, ECCV.

[38]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).