Inception Single Shot MultiBox Detector for object detection

In the current object detection field, one of the fastest algorithms is the Single Shot Multi-Box Detector (SSD), which uses a single convolutional neural network to detect the object in an image. Although SSD is fast, there is a big gap compared with the state-of-the-art on mAP. In this paper, we propose a method to improve SSD algorithm to increase its classification accuracy without affecting its speed. We adopt the Inception block to replace the extra layers in SSD, and call this method Inception SSD (I-SSD). The proposed network can catch more information without increasing the complexity. In addition, we use the batch-normalization (BN) and the residual structure in our I-SSD network architecture. Besides, we propose an improved non-maximum suppression method to overcome its deficiency on the expression ability of the model. The proposed I-SSD algorithm achieves 78.6% mAP on the Pascal VOC2007 test, which outperforms SSD algorithm while maintaining its time performance. We also construct an Outdoor Object Detection (OOD) dataset to testify the effectiveness of the proposed I-SSD on the platform of unmanned vehicles.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Jian Sun,et al.  Object Detection Networks on Convolutional Feature Maps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.