PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Multispectral and multimodal images processing is important in the community of computer vision and computational photography. As the acquired multispectral and multimodal data are generally misaligned due to the alternation or movement of the image device, the image registration procedure is necessary. The registration of multispectral or multimodal image is challenging due to the non-linear intensity and gradient variation. To cope with this challenge, we propose the phase congruency network (PCNet), which is able to enhance the structure similarity and alleviate the non-linear intensity and gradient variation. The images can then be aligned using the similarity enhanced features produced by the network. PCNet is constructed under the guidance of the phase congruency prior. The network contains three trainable layers accompany with the modified learnable Gabor kernels according to the phase congruency theory. Thanks to the prior knowledge, PCNet is extremely light weight and can be trained on quite a small amount of multispectral data. PCNet can be viewed to be fully convolutional and hence can take input of arbitrary sizes. Once trained, PCNet is applicable on a variety of multispectral and multimodal data such as RGB/NIR and flash/no-flash images without additional further tuning. Experimental results validate that PCNet outperforms current state-of-the-art registration algorithms, including the deep-learning based ones that have the number of parameters hundreds times compared to PCNet. Thank to the similarity enhancement training, PCNet outperforms the original phase congruency algorithm with two-thirds less feature channels.

[1]  Sadegh Abbasi,et al.  Shape similarity retrieval under affine transform: application to multi-view object representation and recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Ke Wang,et al.  Image feature detection from phase congruency based on two-dimensional Hilbert transform , 2011, Pattern Recognit. Lett..

[3]  Zheng Liu,et al.  Directive Contrast Based Multimodal Medical Image Fusion in NSCT Domain , 2013, IEEE Transactions on Multimedia.

[4]  Zheng Liu,et al.  Phase congruence measurement for image similarity assessment , 2007, Pattern Recognit. Lett..

[5]  Bogusław Cyganek,et al.  Object Detection and Recognition in Digital Images: Theory and Practice , 2013 .

[6]  Matti Pietikäinen,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[7]  Lorenzo Bruzzone,et al.  Robust Registration of Multimodal Remote Sensing Images Based on Structural Similarity , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Sabine Süsstrunk,et al.  Multispectral interest points for RGB-NIR image registration , 2011, 2011 18th IEEE International Conference on Image Processing.

[9]  Mang Ye,et al.  Grayscale Enhancement Colorization Network for Visible-Infrared Person Re-Identification , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Robert Reams,et al.  Hadamard inverses, square roots and products of almost semidefinite matrices , 1999 .

[11]  Minh N. Do,et al.  DASC: Robust Dense Descriptor for Multi-Modal and Multi-Spectral Correspondence Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[14]  D. Gabor,et al.  Theory of communication. Part 1: The analysis of information , 1946 .

[15]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[16]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[17]  Hui-Liang Shen,et al.  Boosting Structure Consistency for Multispectral and Multimodal Image Registration , 2020, IEEE Transactions on Image Processing.

[18]  David A. Clausi,et al.  ARRSI: Automatic Registration of Remote-Sensing Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Guy Marchal,et al.  Multimodality image registration by maximization of mutual information , 1997, IEEE Transactions on Medical Imaging.

[20]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[21]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[22]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[23]  P Kovesi,et al.  Phase congruency: A low-level image invariant , 2000, Psychological research.

[24]  Jue Wang,et al.  Content-Aware Unsupervised Deep Homography Estimation , 2020, ECCV.

[25]  Jürgen Weese,et al.  A comparison of similarity measures for use in 2-D-3-D medical image registration , 1998, IEEE Transactions on Medical Imaging.

[26]  Chen Chen,et al.  Gabor Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Seungryong Kim,et al.  LAT: Local area transform for cross modal correspondence matching , 2017, Pattern Recognit..

[28]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[29]  Til Aach,et al.  Multispectral filter wheel cameras: modeling aberrations for filters in front of lens , 2011, Electronic Imaging.

[30]  M. Concetta Morrone,et al.  An adaptive approach to scale selection for line and edge detection , 1995, Pattern Recognit. Lett..

[31]  Rynson W. H. Lau,et al.  Saliency Detection with Flash and No-flash Image Pairs , 2014, ECCV.

[32]  Yong Man Ro,et al.  Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Ayan Chakrabarti,et al.  Statistics of real-world hyperspectral images , 2011, CVPR 2011.

[34]  Qingwu Hu,et al.  RIFT: Multi-Modal Image Matching Based on Radiation-Variation Insensitive Feature Transform , 2019, IEEE Transactions on Image Processing.

[35]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[36]  Angel Domingo Sappa,et al.  LGHD: A feature descriptor for matching across non-linear intensity variations , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[37]  Shree K. Nayar,et al.  Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum , 2010, IEEE Transactions on Image Processing.

[38]  Ziming Zhang,et al.  Deep Lucas-Kanade Homography for Multimodal Image Alignment , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kin-Man Lam,et al.  Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Jon Atli Benediktsson,et al.  Segmentation and classification of hyperspectral images using watershed transformation , 2010, Pattern Recognit..

[41]  Yacov Hel-Or,et al.  Matching by Tone Mapping: Photometric Invariant Template Matching , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Chang-Hwan Son,et al.  Near-Infrared Fusion via Color Regularization for Haze and Color Distortion Removals , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Junjun Jiang,et al.  Image Matching from Handcrafted to Deep Features: A Survey , 2020, International Journal of Computer Vision.

[44]  J. P. Lewis Fast Normalized Cross-Correlation , 2010 .

[45]  Nassir Navab,et al.  Structural image representation for image registration , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[46]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[47]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[48]  Qi Zhang,et al.  Multi-modal and Multi-spectral Registration for Natural Images , 2014, ECCV.

[49]  Mark R. Pickering,et al.  A Low-Complexity Image Registration Algorithm for Global Motion Estimation , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Mingyue Ding,et al.  Two Phase Non-Rigid Multi-Modal Image Registration Using Weber Local Descriptor-Based Similarity Metrics and Normalized Mutual Information , 2013, Sensors.

[51]  Robyn A. Owens,et al.  Feature detection from local energy , 1987, Pattern Recognit. Lett..

[52]  Baochang Zhang,et al.  Local Derivative Pattern Versus Local Binary Pattern: Face Recognition With High-Order Local Pattern Descriptor , 2010, IEEE Transactions on Image Processing.

[53]  S Marcelja,et al.  Mathematical description of the responses of simple cortical cells. , 1980, Journal of the Optical Society of America.

[54]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[55]  Fabio Roli,et al.  Multimodal Person Reidentification Using RGB-D Cameras , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Robyn A. Owens,et al.  2D feature detection via local energy , 1997, Image Vis. Comput..

[58]  Tae-Sun Choi,et al.  Accurate Registration Using Adaptive Block Processing for Multispectral Images , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[59]  Yafei Zhang,et al.  Infrared and visible image fusion scheme based on NSCT and low-level visual features , 2016 .

[60]  Xavier Binefa,et al.  DLIG: Direct Local Indirect Global Alignment for Video Mosaicing , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[61]  Feng Liu,et al.  Deep Homography Estimation for Dynamic Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Xin Yu,et al.  SOSNet: Second Order Similarity Regularization for Local Descriptor Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[64]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[65]  Hui-Liang Shen,et al.  Normalized Total Gradient: A New Measure for Multispectral Image Registration , 2017, IEEE Transactions on Image Processing.

[66]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[67]  Daniel Rueckert,et al.  Nonrigid registration using free-form deformations: application to breast MR images , 1999, IEEE Transactions on Medical Imaging.

[68]  Pengfei Shi,et al.  Iris Feature Extraction Using 2D Phase Congruency , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[69]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[70]  Long Quan,et al.  ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[72]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.