Performance evaluation of similarity measures for dense multimodal stereovision

Abstract. Multimodal imaging systems have recently been drawing attention in fields such as medical imaging, remote sensing, and video surveillance systems. In such systems, estimating depth has become possible due to the promising progress of multimodal matching techniques. We perform a systematic performance evaluation of similarity measures frequently used in the literature for dense multimodal stereovision. The evaluated measures include mutual information (MI), sum of squared distances, normalized cross-correlation, census transform, local self-similarity (LSS) as well as descriptors adopted to multimodal settings, like scale invariant feature transform (SIFT), speeded-up robust features (SURF), histogram of oriented gradients (HOG), binary robust independent elementary features, and fast retina keypoint (FREAK). We evaluate the measures over datasets we generated, compiled, and provided as a benchmark and compare the performances using the Winner Takes All method. The datasets are (1) synthetically modified four popular pairs from the Middlebury Stereo Dataset (namely, Tsukuba, Venus, Cones, and Teddy) and (2) our own multimodal image pairs acquired using the infrared and the electro-optical cameras of a Kinect device. The results show that MI and HOG provide promising results for multimodal imagery, and FREAK, SURF, SIFT, and LSS can be considered as alternatives depending on the multimodality level and the computational complexity requirements of the intended application.

[1]  Austin A. Richards,et al.  Alien Vision: Exploring the Electromagnetic Spectrum with Imaging Technology , 2001 .

[2]  H. Hirschmüller Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Sinan Kalkan,et al.  An iterative adaptive multi-modal stereo-vision method using mutual information , 2015, J. Vis. Commun. Image Represent..

[4]  Geoffrey Egnal,et al.  Mutual Information as a Stereo Correspondence Measure , 2000 .

[5]  Vladimir Kolmogorov,et al.  Visual correspondence using energy minimization and mutual information , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  C. Fookes Medical image registration and stereo vision using mutual information , 2003 .

[7]  Guy Marchal,et al.  Multimodality image registration by maximization of mutual information , 1997, IEEE Transactions on Medical Imaging.

[8]  Jake K. Aggarwal,et al.  Structure from stereo-a review , 1989, IEEE Trans. Syst. Man Cybern..

[9]  David A. Clausi,et al.  Automatic registration of SAR and visible band remote sensing images , 2002, IEEE International Geoscience and Remote Sensing Symposium.

[10]  Toby P. Breckon,et al.  Multi-modal target detection for autonomous wide area search and surveillance , 2013, Optics/Photonics in Security and Defence.

[11]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[12]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Angel Domingo Sappa,et al.  Evaluation of Similarity Functions in Multimodal Stereo , 2012, ICIAR.

[14]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Sinan Kalkan,et al.  Multimodal Stereo Vision Using Mutual Information with Adaptive Windowing , 2013, MVA.

[16]  Hany Farid,et al.  Medical image registration with partial data , 2006, Medical Image Anal..

[17]  Mohan M. Trivedi,et al.  Multimodal Stereo Image Registration for Pedestrian Detection , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[18]  Michael Brady,et al.  Non-rigid Multimodal Image Registration Using Local Phase , 2004, MICCAI.

[19]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[20]  Nahum Gat,et al.  True-color night vision (TCNV) fusion system using a VNIR EMCCD and a LWIR microbolometer camera , 2010, Defense + Commercial Sensing.

[21]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[23]  Pierre-Luc St-Charles,et al.  Thermal–visible registration of human silhouettes: A similarity measure performance evaluation , 2014 .

[24]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[25]  Alptekin Temizel,et al.  Mean-shift tracking for surveillance applications using thermal and visible band data fusion , 2011, Defense + Commercial Sensing.

[26]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[27]  Guillaume-Alexandre Bilodeau,et al.  A comparative evaluation of multimodal dense stereo correspondence measures , 2011, 2011 IEEE International Symposium on Robotic and Sensors Environments (ROSE).

[28]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[29]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[31]  P. E. Anuta,et al.  Spatial Registration of Multispectral and Multitemporal Digital Imagery Using Fast Fourier Transform Techniques , 1970 .

[32]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[34]  Mohan M. Trivedi,et al.  Mutual information based registration of multimodal stereo videos for person tracking , 2007, Comput. Vis. Image Underst..

[35]  R. Sahoo,et al.  Hyperspectral Remote Sensing , 2020 .

[36]  R. Kwok,et al.  Automated Multisensor Registration: Requirements And Techniques , 1990, 10th Annual International Symposium on Geoscience and Remote Sensing.

[37]  M. Viergever,et al.  Medical image matching-a review with classification , 1993, IEEE Engineering in Medicine and Biology Magazine.

[38]  M. Eismann Hyperspectral Remote Sensing , 2012 .

[39]  Rama Chellappa,et al.  Hierarchical stereo and motion correspondence using feature groupings , 1995, International Journal of Computer Vision.

[40]  Mohan M. Trivedi,et al.  Registration of Multimodal Stereo Images Using Disparity Voting from Correspondence Windows , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[41]  Mohammed Bennamoun,et al.  A New Stereo Image Matching Technique using Mutual Information , 2001 .

[42]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[43]  Nicholas Ayache,et al.  The Correlation Ratio as a New Similarity Measure for Multimodal Image Registration , 1998, MICCAI.

[44]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[45]  David A. Clausi,et al.  ARRSI: Automatic Registration of Remote-Sensing Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Guillaume-Alexandre Bilodeau,et al.  A LSS-based registration of stereo thermal-visible videos of multiple people using belief propagation , 2013, Comput. Vis. Image Underst..

[48]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[50]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[52]  Norbert Krüger,et al.  Multi-Modal Matching Applied to Stereo , 2003, BMVC.

[53]  Sridha Sridharan,et al.  Multi-spectral stereo image matching using mutual information , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[54]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Heiko Hirschmüller,et al.  Evaluation of Stereo Matching Costs on Images with Radiometric Differences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Peter Gluchowski,et al.  F , 1934, The Herodotus Encyclopedia.

[57]  Angel Domingo Sappa,et al.  Multimodal Stereo Vision System: 3D Data Extraction and Algorithm Evaluation , 2012, IEEE Journal of Selected Topics in Signal Processing.

[58]  John A. Richards,et al.  Remote Sensing Digital Image Analysis , 1986 .

[59]  Cyril Cassisa,et al.  Local vs global energy minimization methods: Application to stereo matching , 2010, 2010 IEEE International Conference on Progress in Informatics and Computing.

[60]  Darius Burschka,et al.  Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Andreas Steininger,et al.  Flexible Hardware-Based Stereo Matching , 2008, EURASIP J. Embed. Syst..

[62]  Guillaume-Alexandre Bilodeau,et al.  Local self-similarity as a dense stereo correspondence measure for themal-visible video registration , 2011, CVPR 2011 WORKSHOPS.

[63]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .