Object Detection Based on Multi-Layer Convolution Feature Fusion and Online Hard Example Mining

Object detection is a significant issue in visual surveillance. Faster region-based convolutional neural network (R-CNN) is a typical object detection algorithm of deep learning; however, neither its generalization ability nor its detection accuracy of small object is high. In this paper, an effective object detection algorithm is proposed for the small and occluded objects, which is based on multi-layer convolution feature fusion (MCFF) and online hard example mining (OHEM). First, the candidate regions are generated with region proposal network optimized by MCFF. Then, an effective OHEM algorithm is employed to train the region-based ConvNet detector. The hard examples are automatically selected to improve training efficiency. The avoidance of invalid examples accelerates the convergence speed of the model training. The experiments are performed on KITTI data set in intelligent traffic scenario. The proposed method outperforms the popular methods, such as Faster R-CNN, Regionlets, in terms of the overall detection accuracy. Furthermore, our method is good at the detection of small and occluded objects.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Mohan M. Trivedi,et al.  Learning to Detect Vehicles by Clustering Appearance Patterns , 2015, IEEE Transactions on Intelligent Transportation Systems.

[3]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[6]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Anton van den Hengel,et al.  Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  D. Baskar,et al.  Automatic object detection in car-driving sequence using neural network and optical flow analysis , 2014, 2014 IEEE International Conference on Computational Intelligence and Computing Research.

[14]  Jie Tian,et al.  Image segmentation via fuzzy object extraction and edge detection and its medical application , 2001 .

[15]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Sambit Bakshi,et al.  A Neuromorphic Person Re-Identification Framework for Video Surveillance , 2017, IEEE Access.

[17]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[19]  Hugo Proença,et al.  Face recognition: handling data misalignments implicitly by fusion of sparse representations , 2015, IET Comput. Vis..

[20]  Massimo Tistarelli,et al.  Feature Level Fusion of Face and Fingerprint Biometrics , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[21]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Myung-Cheol Roh,et al.  Refining faster-RCNN for accurate object detection , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[23]  Zhenzhu Zheng,et al.  A joint optimization scheme to combine different levels of features for face recognition with makeup changes , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[24]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Cordelia Schmid,et al.  Segmentation Driven Object Detection with Fisher Vectors , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Sambit Bakshi,et al.  An Evaluation of Background Subtraction for Object Detection Vis-a-Vis Mitigating Challenging Scenarios , 2016, IEEE Access.

[29]  Yao Minghai Intelligent transportation monitoring system based on computer vision , 2010 .

[30]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jun Wan,et al.  Explore Efficient Local Features from RGB-D Data for One-Shot Learning Gesture Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Guodong Guo,et al.  Support vector machines for face recognition , 2001, Image Vis. Comput..

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Chen Bo,et al.  Vision-based Object Detection and Tracking: A Review , 2016 .

[37]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[38]  Song-Chun Zhu,et al.  Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model , 2014, ECCV.

[39]  Alan E. Robinson,et al.  Explaining brightness illusions using spatial filtering and local response normalization , 2007, Vision Research.

[40]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[41]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.