Sparse Unsupervised Capsules Generalize Better

We show that unsupervised training of latent capsule layers using only the reconstruction loss, without masking to select the correct output class, causes a loss of equivariances and other desirable capsule qualities. This implies that supervised capsules networks can't be very deep. Unsupervised sparsening of latent capsule layer activity both restores these qualities and appears to generalize better than supervised masking, while potentially enabling deeper capsules networks. We train a sparse, unsupervised capsules network of similar geometry to Sabour et al (2017) on MNIST, and then test classification accuracy on affNIST using an SVM layer. Accuracy is improved from benchmark 79% to 90%.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[3]  Gideon Kowadlo,et al.  Computational Neuroscience Offers Hints for More General Machine Learning , 2017, AGI.

[4]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[5]  Nojun Kwak,et al.  Broadcasting Convolutional Network for Visual Relational Reasoning , 2017, ECCV.

[6]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[8]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[11]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[12]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[13]  Fang Zhao,et al.  Marginalized CNN: Learning Deep Invariant Representations , 2017, BMVC.

[14]  Tijmen Tieleman,et al.  Optimizing Neural Networks that Generate Iimages , 2014 .

[15]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[16]  Brendan J. Frey,et al.  Winner-Take-All Autoencoders , 2014, NIPS.

[17]  Abhinav Gupta,et al.  Contextual Priming and Feedback for Faster R-CNN , 2016, ECCV.

[18]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[19]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[20]  Brendan J. Frey,et al.  k-Sparse Autoencoders , 2013, ICLR.

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[23]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[26]  Michael Elad,et al.  K-SVD : DESIGN OF DICTIONARIES FOR SPARSE REPRESENTATION , 2005 .