The Recently proposed CapsNet has attracted the attention of many researchers. It is a potential alternative to convolutional neural networks (CNNs) and achieves significant increase in performance on some simple datasets like MNIST. However, CapsNet gets a poor performance on more complex datasets like CIFAR-10. To address this problem, we focus on the improvement of the original CapsNet from both the network structure and the dynamic routing mechanism. A new CapsNet architecture aiming at complex data called Capsule Network based on Deep Dynamic Routing Mechanism (DDRM-CapsNet) is proposed. For the purpose of extracting better features, we increase the number of convolutional layers before capsule layer in the encoder. We also improve the dynamic routing mechanism in the original CapsNet by expanding it into two stages and increasing the dimensionality of the final output vector. To verify the efficacy of our proposed network on complex data, we conduct experiments with a single model without using any ensembled methods and data augmentation techniques on five real-world complex datasets. The experimental results demonstrate that our proposed method achieves better accuracy results than the baseline and can still improve the reconstruction performance on the premise of using the same decoder structure as the original CapsNet.
[1]
Rinat Mukhometzianov,et al.
CapsNet comparative performance evaluation for image classification
,
2018,
ArXiv.
[2]
Dumitru Erhan,et al.
Going deeper with convolutions
,
2014,
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3]
Andrew Zisserman,et al.
Very Deep Convolutional Networks for Large-Scale Image Recognition
,
2014,
ICLR.
[4]
David J. Kriegman,et al.
Acquiring linear subspaces for face recognition under variable lighting
,
2005,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5]
Geoffrey E. Hinton,et al.
ImageNet classification with deep convolutional neural networks
,
2012,
Commun. ACM.
[6]
David J. Kriegman,et al.
From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose
,
2001,
IEEE Trans. Pattern Anal. Mach. Intell..
[7]
Rob Fergus,et al.
Visualizing and Understanding Convolutional Networks
,
2013,
ECCV.
[8]
Luc Van Gool,et al.
Multi-view traffic sign detection, recognition, and 3D localisation
,
2014,
Machine Vision and Applications.
[9]
Geoffrey E. Hinton,et al.
Dynamic Routing Between Capsules
,
2017,
NIPS.