CapsNet based on Encoder and Decoder for Object Detection

The recently proposed capsule network (CapsNet) can learn the hierarchy relationships of entity features and realize the equivariance to affine transformations, which makes the capsule architecture more promising for object detection. In this paper, based on capsule architecture, we create the CapsNet-V1 models for object detection. The proposed CapsNetV1 mainly consists of the classification net as encoder to extract multi-class information and the reconstruction net as decoder to obtain masks with multi-object position information. In the experiments, based on the randomly expanded MNIST dataset, we simultaneously evaluate the multi-object classification and reconstruction abilities of the proposed CapsNet. The results indicate that our capsule models can reconstruct the object masks with accurate location information at correct labels, which exactly demonstrates the feasibility of using capsule networks for object detection. Further, our CapsNet can be widely applied to the multi-object detection with simple backgrounds in the industrial production lines.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Wenyang Guo,et al.  A Novel Capsule Based Hybrid Neural Network for Sentiment Classification , 2019, IEEE Access.

[4]  Geoffrey E. Hinton,et al.  Transforming Autoencoders , 2011 .

[5]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[6]  Wei Zhang,et al.  Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction , 2018, EMNLP.

[7]  Alok Porwal,et al.  Capsulenet-Based Spatial–Spectral Classifier for Hyperspectral Images , 2019, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[8]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Konstantinos N. Plataniotis,et al.  Brain Tumor Type Classification via Capsule Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[11]  Xiaodong Zhu,et al.  A Deep Learning Iris Recognition Method Based on Capsule Network Architecture , 2019, IEEE Access.

[12]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Fuji Ren,et al.  EEG Emotion Recognition Based on Granger Causality and CapsNet Neural Network , 2018, 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS).

[15]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[16]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[17]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[18]  Ulas Bagci,et al.  Capsules for Object Segmentation , 2018, ArXiv.

[19]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.