Scale Mapping and Dynamic Re-Detecting in Dense Head Detection

Convolutional neural networks (CNNs) have demonstrated a strong ability to extract semantics from images during object detection; however, the extracted semantics are typically have strong scale priors for a specific circumstance. In this paper, we investigate the influence of head scale and contextual information, and then propose a scale-invariant method for head detection. Our method can dynamically detect heads depending on the complexity of the image. It uses an extra feature map to represent the scale information of the spatial relationship, and then uses this feature map for auxiliary detection. Particularly, we exploit several new techniques, including contextual information, scale invariance, and hard example mining. We evaluated our method on three head datasets and achieved state-of-the-art results for the Brainwash dataset, HollywoodHeads dataset, and SCUT-HEAD dataset.

[1]  Ivan Laptev,et al.  Context-Aware CNNs for Person Head Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Pei Li,et al.  People counting based on head detection combining Adaboost and CNN in crowded surveillance environment , 2016, Neurocomputing.

[4]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[9]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiaogang Wang,et al.  Crafting GBD-Net for Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[12]  Shuo Yang,et al.  Face Detection through Scale-Friendly Deep Convolutional Networks , 2017, ArXiv.

[13]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Yong Dou,et al.  Localized region context and object feature fusion for people head detection , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[15]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[16]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).