An Efficient Hierarchical Convolutional Neural Network for Traffic Object Detection

In this paper, we propose a novel hierarchical convolutional neural network for traffic object detection, which is defined as Fusion and Multi-level Alignment CNN (namely FMLA-CNN). The method extends a popular two-stage detector by incorporating a remodified feature fusion module and a multi-level alignment (MLA) strategy such that it is capable of efficiently detecting multi-scale objects in autonomous driving scenario. The feature fusion strategy in proposal generation network improves detection accuracy by inserting high-level semantics to the whole pyramidal feature hierarchy. Subsequently the MLA strategy in the second detection stage can exactly reserve spatial locations from corresponding feature layers determined by hierarchical region-of-interest proposals. In the experiments on KITTI benchmark, our FMLA-CNN achieves an impressively better trade-off between accuracy and efficiency compared with other state-of-the-art methods.

[1]  Fatih Murat Porikli,et al.  Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework , 2015, IEEE Transactions on Intelligent Transportation Systems.

[2]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[6]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Arthur Daniel Costea,et al.  Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[13]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[15]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).