VikingDet: A Real-time Person and Face Detector for Surveillance Cameras

In this paper, we propose a novel one-stage detector that can simultaneously detect both pedestrians and their faces. The framework is named as VikingDet for its simple but effective two-headed architecture. To tackle the challenges of person and face detection especially under surveillance cameras (e.g. low data quality, complex environments, requirements for efficiency, etc.), we make contributions in the following several aspects: 1) integrating both person and face detection into one network which current leading object detection algorithms are seldomly able to handle; 2) emphasizing detection in low-quality images. we introduce multiple thresholds for matching different sized positive samples, and set proper hyper-parameters, hence our VikingDet is able to locate small objects in surveillance cameras even of low-quality; 3) introducing a training strategy to utilize datasets on hand. Since most available public datasets annotate only people without their faces or faces without bodies, we use multi-step training and an integrated loss function to train VikingDet with these partly annotated data. As a consequence, our detector achieves satisfactory performances in several relative benchmarks with a speed at more than 60 FPS on NVIDIA TITAN X GPU, and can be further deployed on an embedded device such as NVIDIA Jetson TX1 or TX2 with a real-time speed of over 28 FPS.

[1]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Jürgen Beyerer,et al.  Low-resolution Convolutional Neural Networks for video face recognition , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Xiaojie Li,et al.  Detector-in-Detector: Multi-Level Analysis for Human-Parts , 2018, ACCV.

[8]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[14]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[16]  Jian Yang,et al.  DSFD: Dual Shot Face Detector , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Sukhendu Das,et al.  Face Recognition on Low Quality Surveillance Images, by Compensating Degradation , 2011, ICIAR.