Non-Parametric Transformation Networks

ConvNets have been very effective in many applications where it is required to learn invariances to within-class nuisance transformations. However, through their architecture, ConvNets only enforce invariance to translation. In this paper, we introduce a new class of convolutional architectures called Non-Parametric Transformation Networks (NPTNs) which can learn general invariances and symmetries directly from data. NPTNs are a direct and natural generalization of ConvNets and can be optimized directly using gradient descent. They make no assumption regarding structure of the invariances present in the data and in that aspect are very flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks which establishes the natural connection between the two. We demonstrate the efficacy of NPTNs on natural data such as MNIST and CIFAR 10 where it outperforms ConvNet baselines with the same number of parameters. We show it is effective in learning invariances unknown apriori directly from data from scratch. Finally, we apply NPTNs to Capsule Networks and show that they enable them to perform even better.

[1]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[2]  Stéphane Mallat,et al.  Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrea Vedaldi,et al.  Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[5]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Pedro M. Domingos,et al.  Deep Symmetry Networks , 2014, NIPS.

[7]  Jiaxing Zhang,et al.  Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[8]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[9]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[10]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[11]  Christopher K. I. Williams,et al.  Transformation Equivariant Boltzmann Machines , 2011, ICANN.

[12]  Amos J. Storkey,et al.  Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[13]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[16]  Deng Cai,et al.  Deep Rotation Equivariant Network , 2017, Neurocomputing.

[17]  Fa Wu,et al.  Flip-Rotate-Pooling Convolution and Split Dropout on Convolution Neural Networks for Image Classification , 2015, ArXiv.

[18]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[19]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[22]  Marios Savvides,et al.  Max-Margin Invariant Features from Transformed Unlabelled Data , 2017, NIPS.

[23]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[25]  Beat Fasel,et al.  Rotation-Invariant Neoperceptron , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[27]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[30]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Marios Savvides,et al.  Discriminative Invariant Kernel Features: A Bells-and-Whistles-Free Approach to Unsupervised Face Recognition and Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[33]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Pascal Frossard,et al.  Graph-based Isometry Invariant Representation Learning , 2017, ICML.

[35]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2014 .

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[38]  Stefan Roth,et al.  Learning rotation-aware features: From invariant priors to equivariant descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Martial Hebert,et al.  Learning to Extract Motion from Videos in Convolutional Neural Networks , 2016, ACCV.