Dense Cross-Modal Correspondence Estimation With the Deep Self-Correlation Descriptor

We present the deep self-correlation (DSC) descriptor for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. We encode local self-similar structure in a pyramidal manner that yields both more precise localization ability and greater robustness to non-rigid image deformations. Specifically, DSC first computes multiple self-correlation surfaces with randomly sampled patches over a local support window, and then builds pyramidal self-correlation surfaces through average pooling on the surfaces. The feature responses on the self-correlation surfaces are then encoded through spatial pyramid pooling in a log-polar . configuration. To better handle geometric variations such as scale and rotation, we additionally propose the geometry-invariant DSC (GI-DSC) that leverages multi-scale self-correlation computation and canonical orientation estimation. In contrast to descriptors based on deep convolutional neural networks (CNNs), DSC and GI-DSC are training-free (i.e., handcrafted descriptors), are robust to cross-modality, and generalize well to various modality variations. Extensive experiments demonstrate the state-of-the-art performance of DSC and GI-DSC on challenging cases of cross-modal image pairs having photometric and/or geometric variations.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[3]  Yuanxin Ye,et al.  A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences , 2014 .

[4]  Stefano Soatto,et al.  Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Cristhian Aguilera,et al.  Learning Cross-Spectral Similarity Measures with Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Tal Hassner,et al.  Dense Correspondences across Scenes and Scales , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andrew Zisserman,et al.  Efficient retrieval of deformable shape classes using local self-similarities , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[8]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[9]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  P. Anandan,et al.  Robust multi-sensor image alignment , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Stephen Lin,et al.  Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence , 2016, ECCV.

[12]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Minh N. Do,et al.  DASC: Robust Dense Descriptor for Multi-Modal and Multi-Spectral Correspondence Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sang Uk Lee,et al.  Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Manuel Menezes de Oliveira Neto,et al.  Domain transform for edge-aware image and video processing , 2011, ACM Trans. Graph..

[17]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[20]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sang Uk Lee,et al.  Joint Depth Map and Color Consistency Estimation for Stereo Images with Different Illuminations and Cameras , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Sang Chul Ahn,et al.  Generalized Deformable Spatial Pyramid: Geometry-preserving dense correspondence estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  VekslerOlga,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001 .

[27]  Cristhian Aguilera,et al.  Cross-Spectral Local Descriptors via Quadruplet Network , 2017, Sensors.

[28]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Francesc Moreno-Noguer,et al.  DaLI: Deformation and Light Invariant Descriptor , 2015, International Journal of Computer Vision.

[31]  Jiangbo Lu,et al.  DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Alberto Del Bimbo,et al.  Local Pyramidal Descriptors for Image Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Dani Lischinski,et al.  Deblurring by Example Using Dense Correspondence , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[39]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[40]  Guillaume-Alexandre Bilodeau,et al.  Local self-similarity-based registration of human ROIs in pairs of stereo thermal-visible videos , 2013, Pattern Recognit..

[41]  Zhuowen Tu,et al.  Scale-Space SIFT flow , 2014, IEEE Winter Conference on Applications of Computer Vision.

[42]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Minh N. Do,et al.  Fast Global Image Smoothing Based on Weighted Least Squares , 2014, IEEE Transactions on Image Processing.

[44]  Kyoung Mu Lee,et al.  Dense 3D Reconstruction from Severely Blurred Images Using a Single Moving Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[46]  Michael Brady,et al.  MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration , 2012, Medical Image Anal..

[47]  Qi Zhang,et al.  Multi-modal and Multi-spectral Registration for Natural Images , 2014, ECCV.

[48]  R. Fergus,et al.  Dark flash photography , 2009, ACM Trans. Graph..

[49]  Seungryong Kim,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Bin Fan,et al.  Local Intensity Order Pattern for feature description , 2011, 2011 International Conference on Computer Vision.

[51]  Iasonas Kokkinos,et al.  Dense Segmentation-Aware Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Sergio Escalera,et al.  Multi-modal RGB–Depth–Thermal Human Body Segmentation , 2016, International Journal of Computer Vision.

[53]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[54]  Xiaopeng Zhang,et al.  Cross-Field Joint Image Restoration via Scale Map , 2013, 2013 IEEE International Conference on Computer Vision.

[55]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[56]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[57]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Eli Shechtman,et al.  Robust patch-based hdr reconstruction of dynamic scenes , 2012, ACM Trans. Graph..

[59]  Adam Finkelstein,et al.  The Generalized PatchMatch Correspondence Algorithm , 2010, ECCV.

[60]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Sajid Saleem,et al.  A Robust SIFT Descriptor for Multispectral Images , 2014, IEEE Signal Processing Letters.

[62]  Lihi Zelnik-Manor,et al.  On SIFTs and their scales , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Minh N. Do,et al.  Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[65]  Jinhui Tang,et al.  Linear Time Illumination Invariant Stereo Matching , 2016, International Journal of Computer Vision.

[66]  Toby P. Breckon,et al.  On Cross-Spectral Stereo Matching using Dense Gradient Features , 2012, BMVC.