Mutual-Supervised Feature Modulation Network for Occluded Pedestrian Detection

State-of-the-art pedestrian detectors have achieved significant progress on non-occluded pedestrians, yet they are still struggling under heavy occlusions. The recent occlusion handling strategy of popular two-stage approaches is to build a two-branch architecture with the help of additional visible body annotations. Nonetheless, these methods still have some weaknesses. Either the two branches are trained independently with only score-level fusion, which cannot guarantee the detectors to learn robust enough pedestrian features. Or the attention mechanisms are exploited to only emphasize on the visible body features. However, the visible body features of heavily occluded pedestrians are concentrated on a relatively small area, which will easily cause missing detections. To address the above issues, we propose in this paper a novel Mutual-Supervised Feature Modulation (MSFM) network, to better handle occluded pedestrian detection. The key MSFM module in our network calculates the similarity loss of full body boxes and visible body boxes corresponding to the same pedestrian so that the full-body detector could learn more complete and robust pedestrian features with the assist of contextual features from the occluding parts. To facilitate the MSFM module, we also propose a novel two-branch architecture, consisting of a standard full body detection branch and an extra visible body classification branch. These two branches are trained in a mutual-supervised way with full body annotations and visible body annotations, respectively. To verify the effectiveness of our proposed method, extensive experiments are conducted on two challenging pedestrian datasets: Caltech and CityPersons, and our approach achieves superior performance compared to other state-of-the-art methods on both datasets, especially in heavy occlusion case.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Shiliang Pu,et al.  Small-Scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation , 2018, ECCV.

[3]  Yuning Jiang,et al.  What Can Help Pedestrian Detection? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yunhong Wang,et al.  Adaptive NMS: Refining Pedestrian Detection in a Crowd , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Chunluan Zhou,et al.  Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Chunluan Zhou,et al.  Bi-box Regression for Pedestrian Detection and Occlusion Estimation , 2018, ECCV.

[12]  Gang Wang,et al.  Graininess-Aware Deep Feature Learning for Pedestrian Detection , 2018, ECCV.

[13]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Chunluan Zhou,et al.  Non-rectangular Part Discovery for Object Detection , 2014, BMVC.

[18]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[19]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[20]  Jinjun Xiong,et al.  Revisiting RCNN: On Awakening the Classification Power of Faster RCNN , 2018, ECCV.

[21]  Fahad Shahbaz Khan,et al.  Mask-Guided Attention Network for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shifeng Zhang,et al.  Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd , 2018, ECCV.

[24]  Ming Yang,et al.  Discriminative Feature Transformation for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Gunhee Kim,et al.  Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[30]  Wei Liu,et al.  Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting , 2018, ECCV.

[31]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xiaoming Liu,et al.  Illuminating Pedestrians via Simultaneous Detection and Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Xiaoming Liu,et al.  Pedestrian Detection With Autoregressive Network Phases , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).