DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

Multi-modality image fusion aims to combine different modalities to produce fused images that retain the complementary features of each modality, such as functional highlights and texture details. To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM). The fusion task is formulated as a conditional generation problem under the DDPM sampling framework, which is further divided into an unconditional generation subproblem and a maximum likelihood subproblem. The latter is modeled in a hierarchical Bayesian manner with latent variables and inferred by the expectation-maximization (EM) algorithm. By integrating the inference solution into the diffusion sampling iteration, our method can generate high-quality fused images with natural image generative priors and cross-modality information from source images. Note that all we required is an unconditional pre-trained generative model, and no fine-tuning is needed. Our extensive experiments indicate that our approach yields promising fusion results in infrared-visible image fusion and medical image fusion. The code is available at \url{https://github.com/Zhaozixiang1228/MMIF-DDFM}.

[1]  F. Yu,et al.  Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects , 2023, ArXiv.

[2]  Yulun Zhang,et al.  HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance , 2023, IEEE transactions on neural networks and learning systems.

[3]  Jiayi Ma,et al.  MURF: Mutually Reinforcing Multi-Modal Image Registration and Fusion , 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yulun Zhang,et al.  Camouflaged Object Detection with Feature Decomposition and Edge Reconstruction , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  L. Gool,et al.  Equivariant Multi-Modality Image Fusion , 2023, ArXiv.

[6]  Yulun Zhang,et al.  Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping , 2023, NeurIPS.

[7]  Risheng Liu,et al.  Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond , 2023, IJCAI.

[8]  J. Kittler,et al.  LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images , 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  L. Gool,et al.  Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  F. Yu,et al.  BiBench: Benchmarking and Analyzing Network Binarization , 2023, ICML.

[11]  Yinhuai Wang,et al.  Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model , 2022, ICLR.

[12]  L. Gool,et al.  CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Risheng Liu,et al.  Towards All Weather and Unobstructed Multi-Spectral Image Stitching: Algorithm and Benchmark , 2022, ACM Multimedia.

[14]  Michael T. McCann,et al.  Diffusion Posterior Sampling for General Noisy Inverse Problems , 2022, ICLR.

[15]  Jiayi Ma,et al.  RFNet: Unsupervised Network for Mutually Reinforcing Multi-modal Image Registration and Fusion , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xin Fan,et al.  Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration , 2022, IJCAI.

[17]  D. Tao,et al.  Defensive Patches for Robust Recognition in the Physical World , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xin Fan,et al.  Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jiayi Ma,et al.  PIAFusion: A progressive infrared and visible image fusion network based on illumination aware , 2022, Inf. Fusion.

[20]  Junmin Liu,et al.  Efficient and Model-Based Infrared and Visible Image Fusion via Algorithm Unrolling , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Michael Elad,et al.  Denoising Diffusion Restoration Models , 2022, NeurIPS.

[22]  P. Dragotti,et al.  Multi-Modal Convolutional Dictionary Learning , 2022, IEEE Transactions on Image Processing.

[23]  Jiayi Ma,et al.  Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network , 2022, Inf. Fusion.

[24]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Karsten Kreis,et al.  Tackling the Generative Learning Trilemma with Denoising Diffusion GANs , 2021, ICLR.

[26]  Xin Fan,et al.  Searching a Hierarchically Aggregated Fusion Architecture for Fast Multi-Modality Image Fusion , 2021, ACM Multimedia.

[27]  Xianglong Liu,et al.  Distribution-Sensitive Information Retention for Accurate Binary Neural Network , 2021, International Journal of Computer Vision.

[28]  Jiwen Lu,et al.  Diverse Sample Generation: Pushing the Limit of Generative Data-Free Quantization , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jiayi Ma,et al.  SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion , 2021, International Journal of Computer Vision.

[30]  Vishal M. Patel,et al.  Image Fusion Transformer , 2021, 2022 IEEE International Conference on Image Processing (ICIP).

[31]  Xingchen Zhang Deep Learning-Based Multi-Focus Image Fusion: A Survey and a Comparative Study , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[33]  Jiangshe Zhang,et al.  Discrete Cosine Transform Network for Guided Depth Map Super-Resolution , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junmin Liu,et al.  Deep Gradient Projection Networks for Pan-sharpening , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  J. Kittler,et al.  RFN-Nest: An end-to-end residual fusion network for infrared and visible images , 2021, Inf. Fusion.

[36]  Xianglong Liu,et al.  Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[38]  Junmin Liu,et al.  FGF-GAN: A Lightweight Generative Adversarial Network for Pansharpening via Fast Guided Filter , 2020, 2021 IEEE International Conference on Multimedia and Expo (ICME).

[39]  Jinyuan Liu,et al.  A Bilevel Integrated Model With Data-Driven Layer Ensemble for Multi-Modality Image Fusion , 2020, IEEE Transactions on Image Processing.

[40]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[41]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[42]  Xiaojie Guo,et al.  U2Fusion: A Unified Unsupervised Image Fusion Network , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  T. Durrani,et al.  NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models , 2020, IEEE Transactions on Instrumentation and Measurement.

[44]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[45]  Yicheng Wang,et al.  Deep Convolutional Sparse Coding Networks for Image Fusion , 2020, ArXiv.

[46]  Jiangshe Zhang,et al.  Bayesian Fusion for Infrared and Visible Images , 2020, Signal Process..

[47]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[48]  Hui Li,et al.  Fast Multi-Scale Structural Patch Decomposition for Multi-Exposure Image Fusion , 2020, IEEE Transactions on Image Processing.

[49]  Junjun Jiang,et al.  FusionDN: A Unified Densely Connected Network for Image Fusion , 2020, AAAI.

[50]  Hao Zhang,et al.  Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity , 2020, AAAI.

[51]  Jiangshe Zhang,et al.  DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion , 2020, IJCAI.

[52]  Xiao-Ping Zhang,et al.  DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion , 2020, IEEE Transactions on Image Processing.

[53]  Yu Liu,et al.  IFCNN: A general image fusion framework based on convolutional neural network , 2020, Inf. Fusion.

[54]  Wei Yu,et al.  Infrared and visible image fusion via detail preserving adversarial learning , 2020, Inf. Fusion.

[55]  Kwanghoon Sohn,et al.  Unsupervised Deep Image Fusion With Structure Tensor Representations , 2020, IEEE Transactions on Image Processing.

[56]  Pier Luigi Dragotti,et al.  Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Junjun Jiang,et al.  FusionGAN: A generative adversarial network for infrared and visible image fusion , 2019, Inf. Fusion.

[58]  Ajith Abraham,et al.  A survey on region based image fusion methods , 2019, Inf. Fusion.

[59]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[60]  Chao Gao,et al.  BASNet: Boundary-Aware Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Hui Li,et al.  DenseFuse: A Fusion Approach to Infrared and Visible Images , 2018, IEEE Transactions on Image Processing.

[62]  Jiayi Ma,et al.  Infrared and visible image fusion methods and applications: A survey , 2018, Inf. Fusion.

[63]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[64]  Jiayi Ma,et al.  Infrared and visible image fusion via gradient transfer and total variation minimization , 2016, Inf. Fusion.

[65]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[66]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[67]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[68]  Belur V. Dasarathy,et al.  Medical Image Fusion: A survey of the state of the art , 2013, Inf. Fusion.

[69]  M. Hogervorst,et al.  Progress in color night vision , 2012 .

[70]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[71]  B. Anderson Reverse-time diffusion equation models , 1982 .

[72]  J. Kautz,et al.  Pseudoinverse-Guided Diffusion Models for Inverse Problems , 2023, ICLR.

[73]  Junjun Jiang,et al.  Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion , 2022, ECCV.

[74]  Xin Fan,et al.  ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion , 2022, ECCV.

[75]  Han Xu,et al.  GANMcC: A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion , 2021, IEEE Transactions on Instrumentation and Measurement.

[76]  Isabelle Bloch,et al.  Image Fusion , 1997 .