Towards Precise End-to-end Semi-Supervised Human Head Detection Network

Head detection, as a fundamental task in practice for many head-related problems, requires an enormous number of annotated boxes to maintain the performance. To alleviate the time and cost of labeling each image in the dataset, we propose an end-to-end semi-supervised head detection frame-work, which shows competitive results with only a small set of data. Specifically, under the setting of semi-supervised, we introduce a weak boxes generate branch and a weak boxes refine branch to produce pseudo ground truth label for unlabeled images with the guidance of annotated images. The weak boxes generate branch is embedded in the detection framework taking the proposals as input and outputting the initial weak boxes that coarsely locate the place of the head. Then, the weak boxes refine branch adjusts the weak boxes more accurate gradually by training a transferred sub-network with the established relation between proposals, weak boxes and labeled boxes. In the training process, we jointly train the two branches in an end-to-end manner, which can generate better pseudo bounding boxes with a small dataset online to avoid over-fitting and obtain a more precise head detector. The results on the public head detection benchmark Brainwash and SCUT-HEAD show the effectiveness of our method.

[1]  Nojun Kwak,et al.  Consistency-based Semi-supervised Learning for Object detection , 2019, NeurIPS.

[2]  Lei Zhang,et al.  Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Xianzhong Long,et al.  Background Error Propagation Model Based RDO in HEVC for Surveillance and Conference Video Coding , 2018, IEEE Access.

[4]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Chao Li,et al.  Cascade Region Proposal and Global Context for Deep Object Detection , 2017, Neurocomputing.

[8]  Shuanglu Dai,et al.  Mixture Statistic Metric Learning for Robust Human Action and Expression Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Changshui Zhang,et al.  Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm , 2017, ArXiv.

[11]  Yuxing Tang,et al.  Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Fei-Fei Li,et al.  Best of both worlds: Human-machine collaboration for object annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gang Chen,et al.  HeadNet: Pedestrian Head Detection Utilizing Body in Context , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[14]  Lianwen Jin,et al.  Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[15]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  King Ngi Ngan,et al.  HeadNet: An End-to-End Adaptive Relational Network for Head Detection , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Bin Wang,et al.  Low Shot Box Correction for Weakly Supervised Object Detection , 2019, IJCAI.

[19]  Yong Dou,et al.  Spatial Attention Network for Head Detection , 2018, PCM.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Xinggang Wang,et al.  Weakly- and Semi-supervised Faster R-CNN with Curriculum Learning , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).