A Block Object Detection Method Based on Feature Fusion Networks for Autonomous Vehicles

Nowadays, automatic multi-objective detection remains a challenging problem for autonomous vehicle technologies. In the past decades, deep learning has been demonstrated successful for multi-objective detection, such as the Single Shot Multibox Detector (SSD) model. The current trend is to train the deep Convolutional Neural Networks (CNNs) with online autonomous vehicle datasets. However, network performance usually degrades when small objects are detected. Moreover, the existing autonomous vehicle datasets could not meet the need for domestic traffic environment. To improve the detection performance of small objects and ensure the validity of the dataset, we propose a new method. Specifically, the original images are divided into blocks as input to a VGG-16 network which add the feature map fusion after CNNs. Moreover, the image pyramid is built to project all the blocks detection results at the original objects size as much as possible. In addition to improving the detection method, a new autonomous driving vehicle dataset is created, in which the object categories and labelling criteria are defined, and a data augmentation method is proposed. The experimental results on the new datasets show that the performance of the proposed method is greatly improved, especially for small objects detection in large image. Moreover, the proposed method is adaptive to complex climatic conditions and contributes a lot for autonomous vehicle perception and planning.

[1]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jeffrey Xu Yu,et al.  Context-Based Diversification for Keyword Queries Over XML Data , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[4]  Lawrence D. Jackel,et al.  Optical character recognition for self-service banking , 1995, AT&T Technical Journal.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[8]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jianxin Li,et al.  Influence Propagation Model for Clique-Based Community Detection in Social Networks , 2018, IEEE Transactions on Computational Social Systems.

[10]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[11]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[13]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[15]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jianxin Li,et al.  Discovering and tracking query oriented active online social groups in dynamic information network , 2018, World Wide Web.

[18]  Jianxin Li,et al.  XClean: Providing valid spelling suggestions for XML keyword queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Chunhong Pan,et al.  Feature Extraction by Rotation-Invariant Matrix Representation for Object Detection in Aerial Image , 2017, IEEE Geoscience and Remote Sensing Letters.

[22]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Danwei Wang,et al.  Trajectory planning for a four-wheel-steering vehicle , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[25]  Feng Xia,et al.  Exploring Human Mobility Patterns in Urban Scenarios: A Trajectory Data Perspective , 2018, IEEE Communications Magazine.

[26]  Jianxin Li,et al.  Access Time Oracle for Planar Graphs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[27]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[29]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.