DVRNet: Decoupled Visible Region Network for Pedestrian Detection

Pedestrian detection remains a challenging task due to the problems caused by occlusion variance. Visible-body bounding boxes are typically used as an extra supervision signal to improve the performance of pedestrian detection to predict the full-body. However, visible-body assisted approaches produce a large number of false positives, which result from a lack of adequate and discriminative full-body contextual information. In this paper, we propose a new network, dubbed DVRNet, based on the representative visible-body assisted pedestrian detector named Bi-box. Specifically, we extend Bi-box by adding three modules named the attention-based feature interleaver module (AFIM), the binary mask learning module (BMLM), and the head-aware feature enhancement module (HFEM), which play important roles in employing features learned by the visible-body and the head supervision signals to enrich high discriminative contextual information of the full-body and enhance the power of feature representation. Experimental results indicate that the DVRNet achieves promising results on the CityPersons and the CrowdHuman datasets.

[1]  Chunluan Zhou,et al.  Bi-box Regression for Pedestrian Detection and Occlusion Estimation , 2018, ECCV.

[2]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Shiliang Pu,et al.  Small-Scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation , 2018, ECCV.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[6]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Fahad Shahbaz Khan,et al.  Mask-Guided Attention Network for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yunhong Wang,et al.  Adaptive NMS: Refining Pedestrian Detection in a Crowd , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Gunhee Kim,et al.  Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Shifeng Zhang,et al.  Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd , 2018, ECCV.

[11]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiangyu Zhang,et al.  CrowdHuman: A Benchmark for Detecting Human in a Crowd , 2018, ArXiv.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sergio A. Velastin,et al.  Local Fisher Discriminant Analysis for Pedestrian Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Wei Liu,et al.  High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[18]  Yi Jiang,et al.  SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition , 2019, J. Mach. Learn. Res..

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Serestina Viriri,et al.  A Survey on Soft Biometrics for Human Identification , 2018 .

[23]  Wei Liu,et al.  Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting , 2018, ECCV.

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.