论文信息 - Wide-residual-inception networks for real-time object detection

Wide-residual-inception networks for real-time object detection

Since convolutional neural network (CNN) models emerged, several tasks in computer vision have actively deployed CNN models for feature extraction. However, the conventional CNN models have a high computational cost and require high memory capacity, which is impractical and unaffordable for commercial applications such as real-time on-road object detection on embedded boards or mobile platforms. To tackle this limitation of CNN models, this paper proposes a wide-residual-inception (WR-Inception) network, which constructs the architecture based on a residual inception unit that captures objects of various sizes on the same feature map, as well as shallower and wider layers, compared to state-of-the-art networks like ResNets. To verify the proposed networks, this paper conducted two experiments; one is a classification task on CIFAR-10/100 and the other is an on-road object detection task using a Single-Shot Multi-box Detector (SSD) on the KITTI dataset. WR-Inception achieves comparable accuracy on CIFAR-10/100, with test errors at 4.82% and 23.12%, respectively, which outperforms 164-layer Pre-ResNets. In addition, the detection experiments demonstrate that the WR-Inception-based SSD outperforms ResNet-101 — based SSD on KITTI. Besides, WR-Inception-based SSD achieves 16 frames per seconds, which is 3.85 times faster than ResNet-101-based SSD. We could expect WR-Inception to be used for real application systems.

[1] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5] Brahim Chaib-draa,et al. Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[6] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[7] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[10] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[12] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[18] Atsuto Maki,et al. Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] G. Sottile,et al. Characterization and performance of the ASIC (CITIROC) front-end of the ASTRI camera , 2015, 1506.00264.

[20] Xiaogang Wang,et al. Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[24] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jian Sun,et al. Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[27] Qiang Chen,et al. Network In Network , 2013, ICLR.

[28] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[30] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[31] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[33] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.

[35] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[36] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[38] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[40] Guosheng Lin,et al. Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Serge J. Belongie,et al. Residual Networks are Exponential Ensembles of Relatively Shallow Networks , 2016, ArXiv.

[43] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.