Multi-Task Vehicle Detection With Region-of-Interest Voting

Vehicle detection is a challenging problem in autonomous driving systems, due to its large structural and appearance variations. In this paper, we propose a novel vehicle detection scheme based on multi-task deep convolutional neural networks (CNNs) and region-of-interest (RoI) voting. In the design of CNN architecture, we enrich the supervised information with subcategory, region overlap, bounding-box regression, and category of each training RoI as a multi-task learning framework. This design allows the CNN model to share visual knowledge among different vehicle attributes simultaneously, and thus, detection robustness can be effectively improved. In addition, most existing methods consider each RoI independently, ignoring the clues from its neighboring RoIs. In our approach, we utilize the CNN model to predict the offset direction of each RoI boundary toward the corresponding ground truth. Then, each RoI can vote those suitable adjacent bounding boxes, which are consistent with this additional information. The voting results are combined with the score of each RoI itself to find a more accurate location from a large number of candidates. Experimental results on the real-world computer vision benchmarks KITTI and the PASCAL2007 vehicle data set show that our approach achieves superior performance in vehicle detection compared with other existing published works.

[1]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Cewu Lu,et al.  Box Aggregation for Proposal Decimation: Last Mile of Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Ramakant Nevatia,et al.  Robust multi-view car detection using unsupervised sub-categorization , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[4]  In-So Kweon,et al.  AttentionNet: Aggregating Weak Directions for Accurate Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Greg Mori,et al.  From Subcategories to Visual Composites: A Multi-level Framework for Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Andrew Zisserman,et al.  Discriminative Sub-categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Deng Cai,et al.  Deep feature based contextual model for object detection , 2016, Neurocomputing.

[8]  Heng Tao Shen,et al.  Exploiting Depth From Single Monocular Images for Object Detection and Semantic Segmentation , 2016, IEEE Transactions on Image Processing.

[9]  Ling Shao,et al.  DAVE: A Unified Framework for Fast Vehicle Detection and Annotation , 2016, ECCV.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[12]  Shiguang Shan,et al.  Deep Network Cascade for Image Super-resolution , 2014, ECCV.

[13]  Song-Chun Zhu,et al.  Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model , 2014, ECCV.

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Nikos Komodakis,et al.  LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Peter V. Gehler,et al.  Multi-View and 3D Deformable Part Models , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[21]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[22]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[26]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[28]  Ce Liu,et al.  Deep Convolutional Neural Network for Image Deconvolution , 2014, NIPS.

[29]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Mohan M. Trivedi,et al.  Learning to Detect Vehicles by Clustering Appearance Patterns , 2015, IEEE Transactions on Intelligent Transportation Systems.

[36]  Fatih Murat Porikli,et al.  Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework , 2015, IEEE Transactions on Intelligent Transportation Systems.

[37]  Song-Chun Zhu,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Learning And-or Model to Represent Context and Occlusion for Car Detection and Viewpoint Estimation , 2022 .

[38]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[39]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[40]  Peter V. Gehler,et al.  Occlusion Patterns for Object Class Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[42]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[43]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[44]  Mohan M. Trivedi,et al.  Vehicle Detection by Independent Parts for Urban Driver Assistance , 2013, IEEE Transactions on Intelligent Transportation Systems.

[45]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[46]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Mohan M. Trivedi,et al.  RefineNet: Iterative refinement for accurate object localization , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[50]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[51]  Qi Tian,et al.  Learning Cascaded Shared-Boost Classifiers for Part-Based Object Detection , 2014, IEEE Transactions on Image Processing.

[52]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[53]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[56]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[59]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Seiichi Mita,et al.  Occlusion handling using discriminative model of trained part templates and conditional random field , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[61]  Verónica Vilaplana,et al.  Binary Partition Trees for Object Detection , 2008, IEEE Transactions on Image Processing.

[62]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Wenze Hu,et al.  Modeling Occlusion by Discriminative AND-OR Structures , 2013, 2013 IEEE International Conference on Computer Vision.

[64]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.