Oriented Response Networks

Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During back-propagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce within-class rotation-invariant deep features while maintaining inter-class discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple state-of-the-art DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Gong Cheng,et al.  RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[6]  Stefan Roth,et al.  Learning rotation-aware features: From invariant priors to equivariant descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Luc Van Gool,et al.  Convolutional Oriented Boundaries , 2016, ECCV.

[9]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[10]  Michele Volpi,et al.  Learning rotation invariant convolutional filters for texture classification , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[11]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Kun Liu,et al.  Rotation-Invariant HOG Descriptors Using Fourier Analysis in Polar and Spherical Coordinates , 2014, International Journal of Computer Vision.

[17]  Stéphane Mallat,et al.  Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kai-Kuang Ma,et al.  Rotation-invariant and scale-invariant Gabor features for texture image retrieval , 2007, Image Vis. Comput..

[20]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[21]  Joachim M. Buhmann,et al.  TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[24]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[28]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[29]  B. S. Manjunath,et al.  Rotation-invariant texture classification using modified Gabor filters , 1995, Proceedings., International Conference on Image Processing.

[30]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Christopher K. I. Williams,et al.  Transformation Equivariant Boltzmann Machines , 2011, ICANN.

[33]  Charless C. Fowlkes,et al.  Oriented edge forests for boundary detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[35]  Fa Wu,et al.  Flip-Rotate-Pooling Convolution and Split Dropout on Convolution Neural Networks for Image Classification , 2015, ArXiv.

[36]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[37]  Andrea Lagorio,et al.  On the Use of SIFT Features for Face Authentication , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[38]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[39]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[40]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[41]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[42]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[43]  Matti Pietikäinen,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[44]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[45]  Marco Reisert,et al.  Circular Fourier-HOG features for rotation invariant object detection in biomedical images , 2012, 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI).

[46]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[47]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[48]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.