Adaptive spatial pixel-level feature fusion network for multispectral pedestrian detection

Abstract A pedestrian detector that uses visible and thermal infrared image pairs as the input has better detection performance than a detector that uses only visible image under challenging illumination conditions. With the aim to efficiently and effectively fuse complementary information from visible and thermal infrared images, this paper proposes an adaptive spatial pixel-level feature fusion network called the ASPFF Net, which can adaptively extract spatial pixel-level features from visible and infrared images for fusion. Specifically, first, two light networks with different weights are used to extract multi-scale features of visible and infrared images. Next, for features of the same scale but different modalities, the fusion weights of different spatial positions and pixels in the two feature maps are obtained by the spatial attention module (SAM) and pixel attention module (PAM). The original features of visible and infrared images are recalibrated by the fusion weights, and multi-scale fused feature layers are obtained. Finally, different scales of pedestrians are detected on the fused multi-scale feature layers. Compared with the other recent multispectral pedestrian detectors on the reasonable subset of the KAIST multispectral pedestrian detection dataset, the proposed detector is attractive in balancing speed and accuracy. The extensive experiments on the KAIST dataset demonstrate the effectiveness of the proposed method for the fusion of visible and infrared image in multispectral pedestrian detection.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[3]  Yanlong Cao,et al.  Exploiting fusion architectures for multispectral pedestrian detection and segmentation. , 2018, Applied optics.

[4]  Yanpeng Cao,et al.  Pedestrian detection with unsupervised multispectral feature learning using deep neural networks , 2019, Inf. Fusion.

[5]  Songtao Liu,et al.  Learning Spatial Fusion for Single-Shot Object Detection , 2019, ArXiv.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[8]  Gurjit Singh Walia,et al.  Human Detection in Video and Images - a State-of-the-Art Survey , 2014, Int. J. Pattern Recognit. Artif. Intell..

[9]  Shu Wang,et al.  Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[10]  Xiangyu Zhu,et al.  Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jiaolong Xu,et al.  Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison , 2016, Sensors.

[12]  Hyunchul Shin,et al.  Multi-layer fusion techniques using a CNN for multispectral pedestrian detection , 2018, IET Comput. Vis..

[13]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[14]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Chengyang Li,et al.  Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation , 2018, BMVC.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yong Wang,et al.  Convolutional neural networks for multispectral pedestrian detection , 2020, Signal Process. Image Commun..

[19]  Zhi Zhang,et al.  Bag of Freebies for Training Object Detection Neural Networks , 2019, ArXiv.

[20]  Hong Qiao,et al.  Cross-modality interactive attention network for multispectral pedestrian detection , 2019, Inf. Fusion.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chong-Min Kyung,et al.  A Low-Complexity Pedestrian Detection Framework for Smart Video Surveillance Systems , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[25]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Mohamed Hammami,et al.  Fusion of thermal infrared and visible spectra for robust moving object detection , 2017, Pattern Analysis and Applications.

[27]  Xun Cao,et al.  Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems , 2020, ECCV.

[28]  Fuchun Sun,et al.  A fast RetinaNet fusion framework for multi-spectral pedestrian detection , 2020 .

[29]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Lie Guo,et al.  Pedestrian Tracking Based on Camshift with Kalman Prediction for Autonomous Vehicles , 2016 .

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Sven Behnke,et al.  Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks , 2016, ESANN.

[33]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  David Gerónimo Gómez,et al.  Survey of Pedestrian Detection for Advanced Driver Assistance Systems , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36]  Chengyang Li,et al.  Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection , 2018, Pattern Recognit..

[37]  Michael Ying Yang,et al.  Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection , 2018, Inf. Fusion.

[38]  Heiko Neumann,et al.  Fully Convolutional Region Proposal Networks for Multispectral Person Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiaoli Hao,et al.  Multispectral pedestrian detection based on deep convolutional neural networks , 2018 .

[41]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.