CoMIR: Contrastive Multimodal Image Representation for Registration

We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one in which general intensity-based, as well as feature-based, registration algorithms can be applied. The method involves training one neural network per modality on aligned images, using a contrastive loss based on noise-contrastive estimation (InfoNCE). Unlike other contrastive coding methods, used for e.g. classification, our approach generates image-like representations that contain the information shared between modalities. We introduce a novel, hyperparameter-free modification to InfoNCE, to enforce rotational equivariance of the learnt representations, a property essential to the registration task. We assess the extent of achieved rotational equivariance and the stability of the representations with respect to weight initialization, training set, and hyperparameter settings, on a remote sensing dataset of RGB and near-infrared images. We evaluate the learnt representations through registration of a biomedical dataset of bright-field and second-harmonic generation microscopy images; two modalities with very little apparent correlation. The proposed approach based on CoMIRs significantly outperforms registration of representations created by GAN-based image-to-image translation, as well as a state-of-the-art, application-specific method which takes additional knowledge about the data into account. Code is available at: this https URL.

[1]  Martin Styner,et al.  Parametric estimate of intensity inhomogeneities applied to MRI , 2000, IEEE Transactions on Medical Imaging.

[2]  David R. Haynor,et al.  Nonrigid multimodality image registration , 2001, SPIE Medical Imaging.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[5]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[6]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[7]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[8]  Michele Volpi,et al.  Semantic segmentation of urban scenes by learning local class interactions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Yu-Ding Lu,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[10]  Radu Hristu,et al.  Multiphoton microscopy of the dermoepidermal junction and automated identification of dysplastic tissues with deep learning. , 2019, Biomedical optics express.

[11]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[12]  Chen Wang,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[13]  Yu-Hui Chen,et al.  Multimodal Image Fusion and Its Applications. , 2016 .

[14]  Hanwei Wu,et al.  Conditional mutual information-based contrastive loss for financial time series forecasting , 2020, ICAIF.

[15]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[16]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[17]  Jordan M. Malof,et al.  A simple rotational equivariance loss for generic convolutional segmentation networks: preliminary results , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[18]  Cordelia Schmid,et al.  Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.

[19]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[23]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[24]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[25]  Joakim Lindblad,et al.  Linear Time Distances Between Fuzzy Sets With Applications to Pattern Matching and Classification , 2014, IEEE Transactions on Image Processing.

[26]  Ying Fu,et al.  Towards Accurate and Robust Multi-Modal Medical Image Registration Using Contrastive Metric Learning , 2019, IEEE Access.

[27]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Marc Niethammer,et al.  Multi-modal registration for correlative microscopy using image analogies , 2014, Medical Image Anal..

[29]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[30]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Frans A. Oliehoek,et al.  Plannable Approximations to MDP Homomorphisms: Equivariance under Actions , 2020, AAMAS.

[32]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[33]  Joakim Lindblad,et al.  Fast and Robust Symmetric Image Registration Based on Distances Combining Intensity and Spatial Information , 2018, IEEE Transactions on Image Processing.

[34]  Maurice Weiler,et al.  General E(2)-Equivariant Steerable CNNs , 2019, NeurIPS.

[35]  Paolo P. Provenzano,et al.  Aligned Collagen Is a Prognostic Signature for Survival in Human Breast Carcinoma Address Reprint Requests to See Related Commentary on Page 966 , 2022 .

[36]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[37]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[39]  Bin Li,et al.  Intensity-based registration of bright-field and second-harmonic generation images of histopathology tissue sections. , 2019, Biomedical optics express.

[40]  Kevin W. Eliceiri,et al.  Highly aligned stromal collagen is a negative prognostic factor following pancreatic ductal adenocarcinoma resection , 2016, Oncotarget.

[41]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[42]  A. Ardeshir Goshtasby Image Registration: Principles, Tools and Methods , 2012 .

[43]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Michael Brady,et al.  MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration , 2012, Medical Image Anal..

[45]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[46]  Kevin W. Eliceiri,et al.  Automated quantification of aligned collagen for human breast carcinoma prognosis , 2014, Journal of pathology informatics.

[47]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[48]  Joseph Paul Cohen,et al.  RandomOut: Using a convolutional gradient norm to win The Filter Lottery , 2016, ArXiv.

[49]  Nassir Navab,et al.  Entropy and Laplacian images: Structural representations for multi-modal registration , 2012, Medical Image Anal..

[50]  Weilin Huang,et al.  Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[51]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[52]  Wei Huang,et al.  Collagen organization of renal cell carcinoma differs between low and high grade tumors , 2019, BMC Cancer.

[53]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[54]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2020, ICLR.

[55]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[56]  D. Kerby The Simple Difference Formula: An Approach to Teaching Nonparametric Correlation1: , 2014 .

[57]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[58]  Maurice Weiler,et al.  A General Theory of Equivariant CNNs on Homogeneous Spaces , 2018, NeurIPS.

[59]  Han Zhang,et al.  Registration of Multimodal Remote Sensing Image Based on Deep Fully Convolutional Neural Network , 2019, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[60]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[61]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[62]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).