A fast RetinaNet fusion framework for multi-spectral pedestrian detection

Abstract At present, the mainstream visible pedestrian detector is easily affected by the ambient lighting, the complex background, and the pedestrians distance. While the infrared images (IR) can compensate for the defects of visible images because of its insensitivity to illumination conditions. Based on the Deep Convolutional Neural Network (DCNN), we proposed a multispectral pedestrian detector that combines visual-optical (VIS) image and infrared (IR) image. We carefully designed three DCNN fusion architectures to study the better fusion stages of the two-branch DCNN. In addition, we compared the three fusion strategies and found that the sum fusion strategy showed better performance to our multispectral detector. Our multispectral pedestrian detectors are more adaptable to the around-the-clock applications such as autonomous driving and unattended monitoring, by testing on the public multispectral benchmark dataset KAIST, our best fusion architectures achieved a log-average miss rate of 27.60% comparable to the state-of-the-art detector, but with half the runtime.

[1]  Gang Xiao,et al.  Feature-based fusion of infrared and visible dynamic images using target detection , 2007 .

[2]  Chong-Min Kyung,et al.  A Low-Complexity Pedestrian Detection Framework for Smart Video Surveillance Systems , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[6]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Shaowen Yao,et al.  A survey of infrared and visual image fusion methods , 2017 .

[9]  James W. Davis,et al.  Background-subtraction using contour-based fusion of thermal and visible imagery , 2007, Comput. Vis. Image Underst..

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Laurent Demanet,et al.  Fast Discrete Curvelet Transforms , 2006, Multiscale Model. Simul..

[13]  Meng Wang,et al.  Scene-Specific Pedestrian Detection for Static Video Surveillance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[15]  Victor Alchanatis,et al.  Image fusion of visible and thermal images for fruit detection. , 2009 .

[16]  Guillaume-Alexandre Bilodeau,et al.  An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications , 2012, Comput. Vis. Image Underst..

[17]  Xiaoming Liu,et al.  Illuminating Pedestrians via Simultaneous Detection and Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Bernt Schiele,et al.  Towards Reaching Human Performance in Pedestrian Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Shao Zhenfeng,et al.  Fusion of infrared and visible images based on focus measure operators in the curvelet domain. , 2012, Applied optics.

[20]  Li-Chen Fu,et al.  Near-Infrared-Based Nighttime Pedestrian Detection Using Grouped Part Models , 2015, IEEE Transactions on Intelligent Transportation Systems.

[21]  Mao Ye,et al.  Accurate object detection using memory-based models in surveillance scenes , 2017, Pattern Recognit..

[22]  Jiaolong Xu,et al.  Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison , 2016, Sensors.

[23]  Mohan M. Trivedi,et al.  Person Surveillance Using Visual and Infrared Imagery , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Yupin Luo,et al.  Real-Time Pedestrian Detection and Tracking at Nighttime for Driver-Assistance Systems , 2009, IEEE Transactions on Intelligent Transportation Systems.

[25]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[26]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yu Fang,et al.  Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion , 2014 .

[28]  Bir Bhanu,et al.  Fusion of color and infrared video for moving human detection , 2007, Pattern Recognit..

[29]  Hui Xiong,et al.  A Unified Framework for Concurrent Pedestrian and Cyclist Detection , 2017, IEEE Transactions on Intelligent Transportation Systems.

[30]  Qian Chen,et al.  Multi-level image fusion and enhancement for target detection , 2015 .

[31]  Yanlong Cao,et al.  Exploiting fusion architectures for multispectral pedestrian detection and segmentation. , 2018, Applied optics.

[32]  Miguel Oliveira,et al.  Multimodal inverse perspective mapping , 2015, Inf. Fusion.

[33]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[34]  Xinkai Wu,et al.  Pedestrian Detection and Tracking from Low-Resolution Unmanned Aerial Vehicle Thermal Imagery , 2016, Sensors.

[35]  Thomas B. Moeslund,et al.  Thermal cameras and applications: a survey , 2013, Machine Vision and Applications.