Thermal–visible registration of human silhouettes: A similarity measure performance evaluation

Abstract When dealing with the registration of information from different image sources, the de facto similarity measure used is Mutual Information (MI). Although MI gives good performance in many image registration applications, recent works in thermal–visible registration have shown that other similarity measures can give results that are as accurate, if not more than MI. Furthermore, some of these measures also have the advantage of being calculated independently from each image to register, which allows them to be integrated more easily in energy minimization frameworks. In this article, we investigate the accuracy of similarity measures for thermal–visible image registration of human silhouettes, including MI, Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC), Histograms of Oriented Gradients (HOG), Local Self-Similarity (LSS), Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), Census, Fast Retina Keypoint (FREAK), and Binary Robust Independent Elementary Feature (BRIEF). We tested the various similarity measures in dense stereo matching tasks over 25,000 windows to have statistically significant results. To do so, we created a new dataset in which one to five humans are walking in a scene in various depth planes. Results show that even if MI is a very strong performer, particularly for large regions of interest (ROI), LSS gives better accuracies when ROI are small or segmented into small fragments because of its ability to capture shape. The other tested similarity measures did not give consistently accurate results.

[1]  Guillaume-Alexandre Bilodeau,et al.  Local self-similarity-based registration of human ROIs in pairs of stereo thermal-visible videos , 2013, Pattern Recognit..

[2]  E. Coiras,et al.  Segment-based registration technique for visual-infrared images , 2000 .

[3]  J. Sarvaiya,et al.  Image Registration by Template Matching Using Normalized Cross-Correlation , 2009, 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[4]  Sridha Sridharan,et al.  Multi-spectral stereo image matching using mutual information , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[5]  Sridha Sridharan,et al.  Multi-spectral stereo image matching using mutual information , 2004 .

[6]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[7]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Thomas S. Huang,et al.  Multimodal Surveillance: an Introduction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[10]  Guillaume-Alexandre Bilodeau,et al.  A LSS-based registration of stereo thermal-visible videos of multiple people using belief propagation , 2013, Comput. Vis. Image Underst..

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Óscar Martínez Mozos,et al.  A comparative evaluation of interest point detectors and local descriptors for visual SLAM , 2010, Machine Vision and Applications.

[14]  Guillaume-Alexandre Bilodeau,et al.  Silhouette-based features for visible-infrared registration , 2011, CVPR 2011 WORKSHOPS.

[15]  Pramod K. Varshney,et al.  On registration of regions of interest (ROI) in video sequences , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[16]  Helmut E. Bez,et al.  A practical adaptive approach for dynamic background subtraction using an invariant colour model and object tracking , 2005, Pattern Recognit. Lett..

[17]  Jong Beom Ra,et al.  Robust multi-sensor image registration by enhancing statistical correlation , 2005, 2005 7th International Conference on Information Fusion.

[18]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[19]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[20]  Guillaume-Alexandre Bilodeau,et al.  A comparative evaluation of multimodal dense stereo correspondence measures , 2011, 2011 IEEE International Symposium on Robotic and Sensors Environments (ROSE).

[21]  Mohan M. Trivedi,et al.  Mutual information based registration of multimodal stereo videos for person tracking , 2007, Comput. Vis. Image Underst..

[22]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Malur K. Sundareshan,et al.  Multi-modal image registration using local frequency representation and computer-aided design (CAD) models , 2007, Image Vis. Comput..

[24]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .

[25]  Guillaume-Alexandre Bilodeau,et al.  Fast and Accurate Registration of Visible and Infrared Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..