Speed up deep neural network based pedestrian detection by sharing features across multi-scale models

Deep neural networks (DNNs) have now demonstrated state-of-the-art detection performance on pedestrian datasets. However, because of their high computational complexity, detection efficiency is still a frustrating problem even with the help of Graphics Processing Units (GPUs). To improve detection efficiency, this paper proposes to share features across a group of DNNs that correspond to pedestrian models of different sizes. By sharing features, the computational burden for extracting features from an image pyramid can be significantly reduced. Simultaneously, we can detect pedestrians of several different scales on one single layer of an image pyramid. Furthermore, the improvement of detection efficiency is achieved with negligible loss of detection accuracy. Experimental results demonstrate the robustness and efficiency of the proposed algorithm.

[1]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[2]  Minho Lee,et al.  Goal-oriented behavior sequence generation based on semantic commands using multiple timescales recurrent neural network with initial state correction , 2014, Neurocomputing.

[3]  王晓刚 Single-Pedestrian Detection aided by Multi-pedestrian Detection , 2013 .

[4]  Kongqiao Wang,et al.  Distributed Object Detection With Linear SVMs , 2014, IEEE Transactions on Cybernetics.

[5]  Gholam Ali Montazer,et al.  An improved radial basis function neural network for object image retrieval , 2015, Neurocomputing.

[6]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[7]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Shiguang Shan,et al.  AU-inspired Deep Networks for Facial Expression Feature Learning , 2015, Neurocomputing.

[12]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Xuelong Li,et al.  Flexible sliding windows with adaptive pixel strides , 2015, Signal Process..

[15]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Xiaogang Wang,et al.  Switchable Deep Network for Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Xiaogang Wang,et al.  DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection , 2014, ArXiv.

[21]  Bernt Schiele,et al.  New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[28]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[29]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[30]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[32]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Xuelong Li,et al.  Efficient object detection by prediction in 3D space , 2015, Signal Process..

[34]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.