PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

Detecting human bodies in highly crowded scenes is a challenging problem. Two main reasons result in such a problem: 1). weak visual cues of heavily occluded instances can hardly provide sufficient information for accurate detection; 2). heavily occluded instances are easier to be suppressed by Non-Maximum-Suppression (NMS). To address these two issues, we introduce a variant of two-stage detectors called PS-RCNN. PS-RCNN first detects slightly/none occluded objects by an R-CNN [1] module (referred as P-RCNN), and then suppress the detected instances by human-shaped masks so that the features of heavily occluded instances can stand out. After that, PS-RCNN utilizes another R-CNN module specialized in heavily occluded human detection (referred as S-RCNN) to detect the rest missed objects by P-RCNN. Final results are the ensemble of the outputs from these two RCNNs. Moreover, we introduce a High Resolution RoI Align (HRRA) module to retain as much of fine-grained features of visible parts of the heavily occluded humans as possible. Our PS-RCNN significantly improves recall and AP by 4.49% and 2.92% respectively on CrowdHuman [2], compared to the baseline. Similar improvements on Widerperson [3] are also achieved by the PS-RCNN.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Xiangyu Zhang,et al.  CrowdHuman: A Benchmark for Detecting Human in a Crowd , 2018, ArXiv.

[5]  Chengyang Li,et al.  Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation , 2018, BMVC.

[6]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Fahad Shahbaz Khan,et al.  Mask-Guided Attention Network for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Shifeng Zhang,et al.  Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd , 2018, ECCV.

[11]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[12]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Chunluan Zhou,et al.  Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Shifeng Zhang,et al.  WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild , 2019, IEEE Transactions on Multimedia.

[15]  Yunhong Wang,et al.  Adaptive NMS: Refining Pedestrian Detection in a Crowd , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.