PFAF-Net: Pyramid Feature Network for Multimodal Fusion

Multimodal fusion is an essential research area in computer vision application. However, there are still many obstacles in the image fusion domain that cause the loss key content, due to method limitation or its least efficiency. To solve these problems, a novel method pyramid feature attention fusion strategy (PFAF-Net) based on multiscale features with core idea of different level fusion strategy is proposed. First, multiscale high-level features with different receptive fields are extracted by pyramid feature extraction module. Second, high-level and low-level features are fused with different fusion techniques, i.e., attention-based fusion strategies are adopted to efficiently fused the global and local features separately. Finally, the fused image is reconstructed with the enhanced fused features by the decoder. Thus, the proposed method not only integrates multimodal data but also has rich content details, inferring a superior image fusion. In addition, the proposed PFAF-Net has better generalization ability than existing methods on four different multimodal (i.e., multimodal medical, multiexposure, multifocus, and visual-infrared) benchmark datasets. Compared with other state-of-the-art fusion methods experimentally, the proposed method has shown comparable or better fusion performance in both subjective and objective evaluation.

[1]  Manas Kamal Bhuyan,et al.  Image Fusion Using Adjustable Non-subsampled Shearlet Transform , 2019, IEEE Transactions on Instrumentation and Measurement.

[2]  Naoto Yokoya,et al.  More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Huchuan Lu,et al.  Multi-Focus Image Fusion With a Natural Enhancement via a Joint Multi-Level Deeply Supervised Convolutional Neural Network , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  R. Venkatesh Babu,et al.  DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Hui Li,et al.  DenseFuse: A Fusion Approach to Infrared and Visible Images , 2018, IEEE Transactions on Image Processing.

[6]  Syed Zulqarnain Gilani,et al.  Unsupervised Deep Multi-focus Image Fusion , 2018, ArXiv.

[7]  Naoto Yokoya,et al.  Learnable manifold alignment (LeMA): A semi-supervised cross-modality learning framework for land cover and land use classification , 2019, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Yu Liu,et al.  IFCNN: A general image fusion framework based on convolutional neural network , 2020, Inf. Fusion.

[11]  Yu Liu,et al.  A medical image fusion method based on convolutional neural networks , 2017, 2017 20th International Conference on Information Fusion (Fusion).

[12]  Kai Zeng,et al.  Perceptual Quality Assessment for Multi-Exposure Image Fusion , 2015, IEEE Transactions on Image Processing.

[13]  Yu Liu,et al.  Dense SIFT for ghost-free multi-exposure fusion , 2015, J. Vis. Commun. Image Represent..

[14]  Shutao Li,et al.  Multifocus image fusion using region segmentation and spatial frequency , 2008, Image Vis. Comput..

[15]  Xiuqing Wu,et al.  A novel similarity based quality metric for image fusion , 2008, 2008 International Conference on Audio, Language and Image Processing.

[16]  Yu Liu,et al.  A general framework for image fusion based on multi-scale transform and sparse representation , 2015, Inf. Fusion.

[17]  Shutao Li,et al.  Image Fusion With Guided Filtering , 2013, IEEE Transactions on Image Processing.

[18]  Hui Li,et al.  NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models , 2020, IEEE Transactions on Instrumentation and Measurement.

[19]  Toet Alexander,et al.  TNO Image Fusion Dataset , 2014 .

[20]  Jiayi Ma,et al.  Infrared and visible image fusion methods and applications: A survey , 2018, Inf. Fusion.

[21]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  M. Hossny,et al.  Comments on 'Information measure for performance of image fusion' , 2008 .

[23]  Yu Han,et al.  A new image fusion performance metric based on visual information fidelity , 2013, Inf. Fusion.