论文信息 - Dense Cross-Modal Correspondence Estimation With the Deep Self-Correlation Descriptor

Dense Cross-Modal Correspondence Estimation With the Deep Self-Correlation Descriptor

We present the deep self-correlation (DSC) descriptor for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. We encode local self-similar structure in a pyramidal manner that yields both more precise localization ability and greater robustness to non-rigid image deformations. Specifically, DSC first computes multiple self-correlation surfaces with randomly sampled patches over a local support window, and then builds pyramidal self-correlation surfaces through average pooling on the surfaces. The feature responses on the self-correlation surfaces are then encoded through spatial pyramid pooling in a log-polar . configuration. To better handle geometric variations such as scale and rotation, we additionally propose the geometry-invariant DSC (GI-DSC) that leverages multi-scale self-correlation computation and canonical orientation estimation. In contrast to descriptors based on deep convolutional neural networks (CNNs), DSC and GI-DSC are training-free (i.e., handcrafted descriptors), are robust to cross-modality, and generalize well to various modality variations. Extensive experiments demonstrate the state-of-the-art performance of DSC and GI-DSC on challenging cases of cross-modal image pairs having photometric and/or geometric variations.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Michael F. Cohen,et al. Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[3] Yuanxin Ye,et al. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences , 2014 .

[4] Stefano Soatto,et al. Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Cristhian Aguilera,et al. Learning Cross-Spectral Similarity Measures with Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6] Tal Hassner,et al. Dense Correspondences across Scenes and Scales , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Andrew Zisserman,et al. Efficient retrieval of deformable shape classes using local self-similarities , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[8] Thomas Brox,et al. Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[9] Silvio Savarese,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] P. Anandan,et al. Robust multi-sensor image alignment , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11] Stephen Lin,et al. Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence , 2016, ECCV.

[12] Vincent Lepetit,et al. DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Minh N. Do,et al. DASC: Robust Dense Descriptor for Multi-Modal and Multi-Spectral Correspondence Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Vincent Lepetit,et al. BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Sang Uk Lee,et al. Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Manuel Menezes de Oliveira Neto,et al. Domain transform for edge-aware image and video processing , 2011, ACM Trans. Graph..

[17] Namil Kim,et al. Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Eli Shechtman,et al. Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Yann LeCun,et al. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[20] Rahul Sukthankar,et al. MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Sang Uk Lee,et al. Joint Depth Map and Color Consistency Estimation for Stereo Images with Different Illuminations and Cameras , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Sang Chul Ahn,et al. Generalized Deformable Spatial Pyramid: Geometry-preserving dense correspondence estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Cordelia Schmid,et al. DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[24] Iasonas Kokkinos,et al. Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] VekslerOlga,et al. Fast Approximate Energy Minimization via Graph Cuts , 2001 .

[27] Cristhian Aguilera,et al. Cross-Spectral Local Descriptors via Quadruplet Network , 2017, Sensors.

[28] Andrew Zisserman,et al. Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Bin Fan,et al. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Francesc Moreno-Noguer,et al. DaLI: Deformation and Light Invariant Descriptor , 2015, International Journal of Computer Vision.

[31] Jiangbo Lu,et al. DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33] Minh N. Do,et al. DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Alberto Del Bimbo,et al. Local Pyramidal Descriptors for Image Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[35] Jian Sun,et al. Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Dani Lischinski,et al. Deblurring by Example Using Dense Correspondence , 2013, 2013 IEEE International Conference on Computer Vision.

[38] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[39] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[40] Guillaume-Alexandre Bilodeau,et al. Local self-similarity-based registration of human ROIs in pairs of stereo thermal-visible videos , 2013, Pattern Recognit..

[41] Zhuowen Tu,et al. Scale-Space SIFT flow , 2014, IEEE Winter Conference on Applications of Computer Vision.

[42] Trevor Darrell,et al. The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43] Minh N. Do,et al. Fast Global Image Smoothing Based on Weighted Least Squares , 2014, IEEE Transactions on Image Processing.

[44] Kyoung Mu Lee,et al. Dense 3D Reconstruction from Severely Blurred Images Using a Single Moving Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[46] Michael Brady,et al. MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration , 2012, Medical Image Anal..

[47] Qi Zhang,et al. Multi-modal and Multi-spectral Registration for Natural Images , 2014, ECCV.

[48] R. Fergus,et al. Dark flash photography , 2009, ACM Trans. Graph..

[49] Seungryong Kim,et al. FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Bin Fan,et al. Local Intensity Order Pattern for feature description , 2011, 2011 International Conference on Computer Vision.

[51] Iasonas Kokkinos,et al. Dense Segmentation-Aware Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Sergio Escalera,et al. Multi-modal RGB–Depth–Thermal Human Body Segmentation , 2016, International Journal of Computer Vision.

[53] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[54] Xiaopeng Zhang,et al. Cross-Field Joint Image Restoration via Scale Map , 2013, 2013 IEEE International Conference on Computer Vision.

[55] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[56] Sabine Süsstrunk,et al. Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[57] Ce Liu,et al. Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Eli Shechtman,et al. Robust patch-based hdr reconstruction of dynamic scenes , 2012, ACM Trans. Graph..

[59] Adam Finkelstein,et al. The Generalized PatchMatch Correspondence Algorithm , 2010, ECCV.

[60] Nikos Komodakis,et al. Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Sajid Saleem,et al. A Robust SIFT Descriptor for Multispectral Images , 2014, IEEE Signal Processing Letters.

[62] Lihi Zelnik-Manor,et al. On SIFTs and their scales , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Minh N. Do,et al. Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[64] Max A. Viergever,et al. Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[65] Jinhui Tang,et al. Linear Time Illumination Invariant Stereo Matching , 2016, International Journal of Computer Vision.

[66] Toby P. Breckon,et al. On Cross-Spectral Stereo Matching using Dense Gradient Features , 2012, BMVC.