Spatio-Contextual Deep Network-Based Multimodal Pedestrian Detection for Autonomous Driving

Pedestrian Detection is the most critical module of an Autonomous Driving system. Although a camera is commonly used for this purpose, its quality degrades severely in low-light night time driving scenarios. On the other hand, the quality of a thermal camera image remains unaffected in similar conditions. This paper proposes an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images. Its novel spatio-contextual deep network architecture is capable of exploiting the multimodal input efficiently. It consists of two distinct deformable ResNeXt-50 encoders for feature extraction from the two modalities. Fusion of these two encoded features takes place inside a multimodal feature embedding module (MuFEm) consisting of several groups of a pair of Graph Attention Network and a feature fusion unit. The output of the last feature fusion unit of MuFEm is subsequently passed to two CRFs for their spatial refinement. Further enhancement of the features is achieved by applying channel-wise attention and extraction of contextual information with the help of four RNNs traversing in four different directions. Finally, these feature maps are used by a single-stage decoder to generate the bounding box of each pedestrian and the score map. We have performed extensive experiments of the proposed framework on three publicly available multimodal pedestrian detection benchmark datasets, namely KAIST, CVC-14, and UTokyo. The results on each of them improved the respective state-of-the-art performance. A short video giving an overview of this work along with its qualitative results can be seen at https://youtu.be/FDJdSifuuCs. Our source code will be released upon publication of the paper.

[1]  Cong Phuoc Huynh,et al.  Domain-Adaptive Pedestrian Detection in Thermal Images , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[2]  Heiko Neumann,et al.  Fully Convolutional Region Proposal Networks for Multispectral Person Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Xiaogang Wang,et al.  Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model , 2016, International Journal of Computer Vision.

[4]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiangyu Chen,et al.  The Cross-Modality Disparity Problem in Multispectral Pedestrian Detection , 2019, ArXiv.

[6]  Matthieu Cord,et al.  Confidence Estimation via Auxiliary Models , 2021, IEEE transactions on pattern analysis and machine intelligence.

[7]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[8]  Vineeth N Balasubramanian,et al.  Borrow From Anywhere: Pseudo Multi-Modal Object Detection in Thermal Imagery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Richard Bowden,et al.  A Survey of Deep Learning Applications to Autonomous Vehicle Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[10]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[11]  Mang Ye,et al.  Improving Night-Time Pedestrian Retrieval With Distribution Alignment and Contextual Distance , 2020, IEEE Transactions on Industrial Informatics.

[12]  Shu Wang,et al.  Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[13]  Ujjwal Bhattacharya,et al.  An End-To-End Framework For Pose Estimation Of Occluded Pedestrians , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[14]  Ben Miethig,et al.  Leveraging Thermal Imaging for Autonomous Driving , 2019, 2019 IEEE Transportation Electrification Conference and Expo (ITEC).

[15]  David Gerónimo Gómez,et al.  Survey of Pedestrian Detection for Advanced Driver Assistance Systems , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xiaoming Liu,et al.  Illuminating Pedestrians via Simultaneous Detection and Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Hong Qiao,et al.  Cross-modality interactive attention network for multispectral pedestrian detection , 2019, Inf. Fusion.

[18]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Debasmita Ghose,et al.  Pedestrian Detection in Thermal Images Using Saliency Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Patrick Mäder,et al.  FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tatsuya Harada,et al.  Multispectral Object Detection for Autonomous Vehicles , 2017, ACM Multimedia.

[24]  Euntai Kim,et al.  Efficient Pedestrian Detection at Nighttime Using a Thermal Camera , 2017, Sensors.

[25]  Moongu Jeon,et al.  SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Sven Behnke,et al.  Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks , 2016, ESANN.

[27]  Weiwei Cai,et al.  Remote Sensing Image Classification Based on a Cross-Attention Mechanism and Graph Convolution , 2020, IEEE Geoscience and Remote Sensing Letters.

[28]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Senthil Yogamani,et al.  Let The Sunshine in: Sun Glare Detection on Automotive Surround-view Cameras , 2020, Autonomous Vehicles and Machines.

[31]  Zhigang Xu,et al.  Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications , 2020, IEEE Sensors Journal.

[32]  Ujjwal Bhattacharya,et al.  ClueNet : A Deep Framework for Occluded Pedestrian Pose Estimation , 2019, BMVC.

[33]  Jiaolong Xu,et al.  Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison , 2016, Sensors.

[34]  Hyunchul Shin,et al.  Multi-layer fusion techniques using a CNN for multispectral pedestrian detection , 2018, IET Comput. Vis..

[35]  Xun Cao,et al.  Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems , 2020, ECCV.

[36]  Yang Zheng,et al.  GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection , 2019, ArXiv.

[37]  Ricardo Omar Chávez García,et al.  Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking , 2016, IEEE Transactions on Intelligent Transportation Systems.

[38]  Xiangyu Zhu,et al.  Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Klaus C. J. Dietmayer,et al.  Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges , 2019, IEEE Transactions on Intelligent Transportation Systems.

[41]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Alberto Del Bimbo,et al.  Task-Conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery , 2020, ECCV.

[43]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Chengyang Li,et al.  Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection , 2018, Pattern Recognit..

[45]  Wei Liu,et al.  High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Michael Ying Yang,et al.  Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection , 2018, Inf. Fusion.

[47]  Chengyang Li,et al.  Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation , 2018, BMVC.

[48]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Miran Pobar,et al.  Thermal Object Detection in Difficult Weather Conditions Using YOLO , 2020, IEEE Access.

[50]  John McDonald,et al.  Vision-Based Driver Assistance Systems: Survey, Taxonomy and Advances , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[51]  Senthil Yogamani,et al.  Near-Field Perception for Low-Speed Vehicle Automation Using Surround-View Fisheye Cameras , 2021, IEEE Transactions on Intelligent Transportation Systems.

[52]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Kihong Park,et al.  Unified multi-spectral pedestrian detection based on probabilistic fusion networks , 2018, Pattern Recognit..

[54]  Omar Nasr,et al.  RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[55]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Song Huang,et al.  Attention Based Multi-Layer Fusion of Multispectral Images for Pedestrian Detection , 2020, IEEE Access.

[57]  Nicu Sebe,et al.  Learning Cross-Modal Deep Representations for Robust Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[59]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[60]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[61]  Kihong Park,et al.  Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[62]  Senthil Yogamani,et al.  Weather and Light Level Classification for Autonomous Driving: Dataset, Baseline and Active Learning , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[63]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.