Generalization ability of region proposal networks for multispectral person detection

Multispectral person detection aims at automatically localizing humans in images that consist of multiple spectral bands. Usually, the visual-optical (VIS) and the thermal infrared (IR) spectra are combined to achieve higher robustness for person detection especially in insufficiently illuminated scenes. This paper focuses on analyzing existing detection approaches for their generalization ability. Generalization is a key feature for machine learning based detection algorithms that are supposed to perform well across different datasets. Inspired by recent literature regarding person detection in the VIS spectrum, we perform a cross-validation study to empirically determine the most promising dataset to train a well-generalizing detector. Therefore, we pick one reference Deep Convolutional Neural Network (DCNN) architecture and three different multispectral datasets. The Region Proposal Network (RPN) originally introduced for object detection within the popular Faster R-CNN is chosen as a reference DCNN. The reason is that a stand-alone RPN is able to serve as a competitive detector for two-class problems such as person detection. Furthermore, current state-of-the-art approaches initially apply an RPN followed by individual classifiers. The three considered datasets are the KAIST Multispectral Pedestrian Benchmark including recently published improved annotations for training and testing, the Tokyo Multi-spectral Semantic Segmentation dataset, and the OSU Color-Thermal dataset including recently released annotations. The experimental results show that the KAIST Multispectral Pedestrian Benchmark with its improved annotations provides the best basis to train a DCNN with good generalization ability compared to the other two multispectral datasets. On average, this detection model achieves a log-average Miss Rate (MR) of 29.74 % evaluated on the reasonable test subsets of the three datasets.

[1]  Kihong Park,et al.  Unified multi-spectral pedestrian detection based on probabilistic fusion networks , 2018, Pattern Recognit..

[2]  Kihong Park,et al.  Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[3]  Johannes Ulén,et al.  Higher-Order Regularization in Computer Vision , 2014 .

[4]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Chengyang Li,et al.  Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation , 2018, BMVC.

[6]  Jürgen Beyerer,et al.  Low Resolution Person Detection with a Moving Thermal Infrared Camera by Hot Spot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Claus Nebauer,et al.  Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.

[8]  Heiko Neumann,et al.  Fully Convolutional Region Proposal Networks for Multispectral Person Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Chengyang Li,et al.  Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection , 2018, Pattern Recognit..

[10]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  James W. Davis,et al.  Fusion-Based Background-Subtraction using Contour Saliency , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[14]  Michael Ying Yang,et al.  Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection , 2018, Inf. Fusion.

[15]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Daniel König,et al.  Deep Learning for Person Detection in Multi-spectral Videos , 2017 .

[17]  Antonio M. López,et al.  Adapting Pedestrian Detection from Synthetic to Far Infrared Images , 2013 .

[18]  Vittorio Murino,et al.  Low-level multimodal integration on Riemannian manifolds for automatic pedestrian detection , 2012, 2012 15th International Conference on Information Fusion.

[19]  Jürgen Beyerer,et al.  CNN-based thermal infrared person detection by domain adaptation , 2018, Defense + Security.

[20]  Xiaoming Liu,et al.  Illuminating Pedestrians via Simultaneous Detection and Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[22]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[23]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[26]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sven Behnke,et al.  Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks , 2016, ESANN.

[28]  James W. Davis,et al.  Background-subtraction using contour-based fusion of thermal and visible imagery , 2007, Comput. Vis. Image Underst..

[29]  Shu Wang,et al.  Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[30]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[32]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Peyman Milanfar,et al.  Linear Support Tensor Machine With LSK Channels: Pedestrian Detection in Thermal Infrared Images , 2016, IEEE Transactions on Image Processing.

[34]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[35]  Tatsuya Harada,et al.  MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[38]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[41]  Riad I. Hammoud,et al.  Thermal-Visible Video Fusion for Moving Target Tracking and Pedestrian Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.