Deep similarity learning for multimodal medical images

An effective similarity measure for multi-modal images is crucial for medical image fusion in many clinical applications. The underlining correlation across modalities is usually too complex to be modelled by intensity-based statistical metrics. Therefore, approaches of learning a similarity metric are proposed in recent years. In this work, we propose a novel deep similarity learning method that trains a binary classifier to learn the correspondence of two image patches. The classification output is transformed to a continuous probability value, then used as the similarity score. Moreover, we propose to utilise multi-modal stacked denoising autoencoder to effectively pre-train the deep neural network. We train and test the proposed metric using sampled corresponding/non-corresponding computed tomography and magnetic resonance head image patches from a same subject. Comparison is made with two commonly used metrics: normalised mutual information and local cross correlation. The contributions of the multi-modal stacked denoising autoencoder and the deep structure of the neural network are also evaluated. Both the quantitative and qualitative results from the similarity ranking experiments show the advantage of the proposed metric for a highly accurate and robust similarity measure.