HeadNet: Pedestrian Head Detection Utilizing Body in Context

Pedestrian head with arbitrary poses and size is prohibitively difficult to detect in many real world applications. An appealing alternative is to utilize object detection technologies, which tend to be more and more mature and faster. However, general object detection technologies can hardly work in complicated scenarios where many heads are often too small to detect. In this paper, we present a novel approach that learns a semantic connection between pedestrian head and other body parts for head detection. Specifically, the proposed model, named as HeadNet, is based on PVANet backbone and also introduces beneficial strategies including online hard example mining (OHEM), fine-grained feature maps, RoI Align and Body in Context (BiC). Experiments demonstrate that our approach is able to utilize spatial semantics of the entire body effectively, and gains inspiring performance for pedestrian head detection.

[1]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Shiguang Shan,et al.  Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Yeongjae Cheon,et al.  PVANet: Lightweight Deep Neural Networks for Real-time Object Detection , 2016, ArXiv.

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Shiguang Shan,et al.  Improving 2D Face Recognition via Discriminative Face Depth Estimation , 2018, 2018 International Conference on Biometrics (ICB).

[15]  Shiguang Shan,et al.  Deep Multi-Task Learning for Joint Prediction of Heterogeneous Face Attributes , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[16]  Xiaoming Liu,et al.  Demographic Estimation from Face Images: Human vs. Machine Performance , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[20]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.