Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture

This paper presents a method that can accurately detect heads especially small heads under the indoor scene. To achieve this, we propose a novel method, Feature Refine Net (FRN), and a cascaded multi-scale architecture. FRN exploits the multi-scale hierarchical features created by deep convolutional neural networks. The proposed channel weighting method enables FRN to make use of features alternatively and effectively. To improve the performance of small head detection, we propose a cascaded multi-scale architecture which has two detectors. One called global detector is responsible for detecting large objects and acquiring the global distribution information. The other called local detector is designed for small objects detection and makes use of the information provided by global detector. Due to the lack of head detection datasets, we have collected and labeled a new large dataset named SCUT-HEAD which includes 4405 images with 111251 heads annotated. Experiments show that our method has achieved state-of-the-art performance on SCUT-HEAD.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yong Dou,et al.  Localized region context and object feature fusion for people head detection , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[7]  Yunchao Wei,et al.  Perceptual Generative Adversarial Networks for Small Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[9]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[10]  Shuo Yang,et al.  Face Detection through Scale-Friendly Deep Convolutional Networks , 2017, ArXiv.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ivan Laptev,et al.  Context-Aware CNNs for Person Head Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[18]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).