Wide-residual-inception networks for real-time object detection

Since convolutional neural network (CNN) models emerged, several tasks in computer vision have actively deployed CNN models for feature extraction. However, the conventional CNN models have a high computational cost and require high memory capacity, which is impractical and unaffordable for commercial applications such as real-time on-road object detection on embedded boards or mobile platforms. To tackle this limitation of CNN models, this paper proposes a wide-residual-inception (WR-Inception) network, which constructs the architecture based on a residual inception unit that captures objects of various sizes on the same feature map, as well as shallower and wider layers, compared to state-of-the-art networks like ResNets. To verify the proposed networks, this paper conducted two experiments; one is a classification task on CIFAR-10/100 and the other is an on-road object detection task using a Single-Shot Multi-box Detector (SSD) on the KITTI dataset. WR-Inception achieves comparable accuracy on CIFAR-10/100, with test errors at 4.82% and 23.12%, respectively, which outperforms 164-layer Pre-ResNets. In addition, the detection experiments demonstrate that the WR-Inception-based SSD outperforms ResNet-101 — based SSD on KITTI. Besides, WR-Inception-based SSD achieves 16 frames per seconds, which is 3.85 times faster than ResNet-101-based SSD. We could expect WR-Inception to be used for real application systems.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[12]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[18]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  G. Sottile,et al.  Characterization and performance of the ASIC (CITIROC) front-end of the ASTRI camera , 2015, 1506.00264.

[20]  Xiaogang Wang,et al.  Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[24]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[27]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[28]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[30]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[31]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[33]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[35]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[36]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[38]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[40]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Serge J. Belongie,et al.  Residual Networks are Exponential Ensembles of Relatively Shallow Networks , 2016, ArXiv.

[43]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.