Deep Pedestrian Detection Using Contextual Information and Multi-level Features

Recently, Faster R-CNN achieves great performance in deep learning based object detection. However, a major bottleneck of Faster R-CNN lies on the sharp performance deterioration when detecting objects that are small in size or have a similar appearance with their backgrounds. To address this problem, we present a new pedestrian detection approach based on Faster R-CNN, which combines contextual information with multi-level features. The contextual information is embedded by pooling information from a larger area around the original region of interest. It helps pedestrians detection from cluttered backgrounds. The multi-level features can be obtained by pooling proposal-specific features from several shallow but high-resolution layers. These features are more informative for detecting small-size pedestrians. Extensive experiments on the challenging Caltech dataset validate that our approach not only performs better than the baseline of Faster R-CNN but also boosts the detection performance when combined with contextual information and multi-level features. Meanwhile, compared with numerous pedestrian detection approaches, our combined method outperforms all of them and achieves a quite superior performance.

[1]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[2]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jungwon Lee,et al.  Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[16]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[19]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.