Occluded Pedestrian Detection Through Guided Attention in CNNs

Pedestrian detection has progressed significantly in the last years. However, occluded people are notoriously hard to detect, as their appearance varies substantially depending on a wide range of occlusion patterns. In this paper, we aim to propose a simple and compact method based on the FasterRCNN architecture for occluded pedestrian detection. We start with interpreting CNN channel features of a pedestrian detector, and we find that different channels activate responses for different body parts respectively. These findings motivate us to employ an attention mechanism across channels to represent various occlusion patterns in one single model, as each occlusion pattern can be formulated as some specific combination of body parts. Therefore, an attention network with self or external guidances is proposed as an add-on to the baseline FasterRCNN detector. When evaluating on the heavy occlusion subset, we achieve a significant improvement of 8pp to the baseline FasterRCNN detector on CityPersons and on Caltech we outperform the state-of-the-art method by 4pp.

[1]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[2]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dariu Gavrila,et al.  Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Bernt Schiele,et al.  Towards Reaching Human Performance in Pedestrian Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Luc Van Gool,et al.  Handling Occlusions with Franken-Classifiers , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[10]  Chunluan Zhou,et al.  Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Joachim Denzler,et al.  Part Detector Discovery in Deep Convolutional Neural Networks , 2014, ACCV.

[12]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[14]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Davide Modolo,et al.  Do Semantic Parts Emerge in Convolutional Neural Networks? , 2016, International Journal of Computer Vision.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[20]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[21]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Anton van den Hengel,et al.  Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features , 2014, ECCV.

[32]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).