ATRIX CAPSULES WITH EM ROUTING

A capsule is a group of neurons whose outputs represent different properties of the same entity. Each layer in a capsule network contains many capsules. We describe a version of capsules in which each capsule has a logistic unit to represent the presence of an entity and a 4x4 matrix which could learn to represent the relationship between that entity and the viewer (the pose). A capsule in one layer votes for the pose matrix of many different capsules in the layer above by multiplying its own pose matrix by trainable viewpoint-invariant transformation matrices that could learn to represent part-whole relationships. Each of these votes is weighted by an assignment coefficient. These coefficients are iteratively updated for each image using the Expectation-Maximization algorithm such that the output of each capsule is routed to a capsule in the layer above that receives a cluster of similar votes. The transformation matrices are trained discriminatively by backpropagating through the unrolled iterations of EM between each pair of adjacent capsule layers. On the smallNORB benchmark, capsules reduce the number of test errors by 45% compared to the state-of-the-art. Capsules also show far more resistance to white box adversarial attacks than our baseline convolutional neural network.

[1]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[2]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Beat Fasel,et al.  Rotation-Invariant Neoperceptron , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[6]  Emmanuel Monfrini,et al.  A Quadratic Loss Multi-Class SVM for which a Radius-Margin Bound Applies , 2011, Informatica.

[7]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[8]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pedro M. Domingos,et al.  Deep Symmetry Networks , 2014, NIPS.

[10]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[14]  Joachim M. Buhmann,et al.  TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andrea Vedaldi,et al.  Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[16]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[17]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[18]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[19]  Matthias Bethge,et al.  Comment on "Biologically inspired protection of deep networks from adversarial attacks" , 2017, ArXiv.

[20]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[24]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.