Learning Pixel-Level and Instance-Level Context-Aware Features for Pedestrian Detection in Crowds

Pedestrian detection in crowded scenes is an intractable problem in computer vision, in which occlusion often presents a great challenge. In this paper, we propose a novel context-aware feature learning method for detecting pedestrians in crowds, with the purpose of making better use of context information for dealing with occlusion. Unlike most current pedestrian detectors that only extract context information from a single and fixed region, a new pixel-level context embedding module is developed to integrate multi-cue context into a deep CNN feature hierarchy, which enables access to the context of various regions by multi-branch convolution layers with different receptive fields. In addition, to utilize the distinctive visual characteristics formed by pedestrians that appear in groups and occlude each other, we propose a novel instance-level context prediction module which is actually implemented by a two-person detector, to improve the one-person detection performance. Applying with these strategies, we achieve an efficient and lightweight detector that can be trained in an end-to-end fashion. We evaluate the proposed approach on two popular pedestrian detection datasets, i.e., Caltech and CityPersons. The extensive experimental results demonstrate the effectiveness of the proposed method, especially under heavy occlusion cases.

[1]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[3]  Minjuan Wang,et al.  Deep Feature Fusion by Competitive Attention for Pedestrian Detection , 2019, IEEE Access.

[4]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Xiaoming Liu,et al.  Illuminating Pedestrians via Simultaneous Detection and Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jungwon Lee,et al.  Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Xiaogang Wang,et al.  Gated Bi-directional CNN for Object Detection , 2016, ECCV.

[11]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Shifeng Zhang,et al.  Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd , 2018, ECCV.

[15]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ming Tang,et al.  PCN: Part and Context Information for Pedestrian Detection with CNNs , 2018, BMVC.

[18]  Wei Liu,et al.  Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting , 2018, ECCV.

[19]  Xiangyu Zhang,et al.  DetNet: Design Backbone for Object Detection , 2018, ECCV.

[20]  Lei Jin,et al.  FPN++: A Simple Baseline for Pedestrian Detection , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chunluan Zhou,et al.  Bi-box Regression for Pedestrian Detection and Occlusion Estimation , 2018, ECCV.

[23]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[27]  Alfred O. Hero,et al.  Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment , 2018, IEEE Access.

[28]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[29]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[30]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Gunhee Kim,et al.  Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Gang Wang,et al.  Graininess-Aware Deep Feature Learning for Pedestrian Detection , 2018, ECCV.

[33]  Xiaogang Wang,et al.  Single-Pedestrian Detection Aided by Two-Pedestrian Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Xiaogang Wang,et al.  Jointly Learning Deep Features, Deformable Parts, Occlusion and Classification for Pedestrian Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yuning Jiang,et al.  What Can Help Pedestrian Detection? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[39]  Chunluan Zhou,et al.  Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Li Cheng,et al.  Too Far to See? Not Really!—Pedestrian Detection With Scale-Aware Localization Policy , 2017, IEEE Transactions on Image Processing.

[42]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Shiliang Pu,et al.  Small-Scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation , 2018, ECCV.