Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection

This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Aiming at generating an image of high visual quality, previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. These approaches neglect that modality differences implying the complementary information are extremely important for both fusion and subsequent detection task. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. The fusion network with one generator and dual discriminators seeks commons while learning from differences, which preserves structural information of targets from the infrared and textural details from the visible. Furthermore, we build a synchronized imaging system with calibrated infrared and optical sensors, and collect currently the most comprehensive benchmark covering a wide range of scenarios. Extensive experiments on several public datasets and our benchmark demonstrate that our method outputs not only visually appealing fusion but also higher detection mAP than the state-of-the-art approaches. The source code and benchmark are available at https://github.com/dlut-dimt/TarDAL.

[1]  Xin Fan,et al.  Searching a Hierarchically Aggregated Fusion Architecture for Fast Multi-Modality Image Fusion , 2021, ACM Multimedia.

[2]  Xin Fan,et al.  Multiple Task-Oriented Encoders for Unified Image Fusion , 2021, 2021 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Xin Fan,et al.  Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[4]  J. Kittler,et al.  RFN-Nest: An end-to-end residual fusion network for infrared and visible images , 2021, Inf. Fusion.

[5]  Xin Fan,et al.  Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Deyu Meng,et al.  Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jinyuan Liu,et al.  A Bilevel Integrated Model With Data-Driven Layer Ensemble for Multi-Modality Image Fusion , 2020, IEEE Transactions on Image Processing.

[8]  Risheng Liu,et al.  Retinex-inspired Unrolling with Cooperative Prior Architecture Search for Low-light Image Enhancement , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiaojie Guo,et al.  U2Fusion: A Unified Unsupervised Image Fusion Network , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jiangshe Zhang,et al.  Bayesian Fusion for Infrared and Visible Images , 2020, Signal Process..

[11]  Zhiying Jiang,et al.  Knowledge-Driven Deep Unrolling for Robust Image Layer Separation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Jiangshe Zhang,et al.  DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion , 2020, IJCAI.

[13]  Xiao-Ping Zhang,et al.  DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion , 2020, IEEE Transactions on Image Processing.

[14]  Gang Xiao,et al.  VIFB: A Visible and Infrared Image Fusion Benchmark , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  L. Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Junjun Jiang,et al.  FusionGAN: A generative adversarial network for infrared and visible image fusion , 2019, Inf. Fusion.

[17]  Risheng Liu,et al.  Task-Oriented Convex Bilevel Optimization With Latent Feasibility , 2019, IEEE Transactions on Image Processing.

[18]  Kilian Q. Weinberger,et al.  Convolutional Networks with Dense Connectivity , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shijian Lu,et al.  Spatial Fusion GAN for Image Synthesis , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Qi Zou,et al.  GraphNet: Learning Image Pseudo Annotations for Weakly-Supervised Semantic Segmentation , 2018, ACM Multimedia.

[22]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Guoqiang Han,et al.  R³Net: Recurrent Residual Refinement Network for Saliency Detection , 2018, IJCAI.

[24]  Hui Li,et al.  DenseFuse: A Fusion Approach to Infrared and Visible Images , 2018, IEEE Transactions on Image Processing.

[25]  Jianqiang Wang,et al.  Object Classification Using CNN-Based Fusion of Vision and LIDAR in Autonomous Vehicle Environment , 2018, IEEE Transactions on Industrial Informatics.

[26]  Yi Liu,et al.  Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review , 2018, Inf. Fusion.

[27]  Tatsuya Harada,et al.  Multispectral Object Detection for Autonomous Vehicles , 2017, ACM Multimedia.

[28]  Alexander Toet,et al.  The TNO Multiband Image Data Collection , 2017, Data in brief.

[29]  Trung Le,et al.  Dual Discriminator Generative Adversarial Nets , 2017, NIPS.

[30]  Yi Chai,et al.  A novel multi-modality image fusion method based on image decomposition and sparse representation , 2017, Inf. Sci..

[31]  Hua Zong,et al.  Infrared and visible image fusion based on visual saliency map and weighted least square optimization , 2017 .

[32]  Jiayi Ma,et al.  Infrared and visible image fusion via gradient transfer and total variation minimization , 2016, Inf. Fusion.

[33]  V. Aslantaş,et al.  A new image quality metric for image fusion: The sum of the correlations of differences , 2015 .

[34]  Thomas Brox,et al.  Bilevel Optimization with Nonsmooth Lower Level Problems , 2015, SSVM.

[35]  Shutao Li,et al.  Image Fusion With Guided Filtering , 2013, IEEE Transactions on Image Processing.

[36]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[37]  G. Qu,et al.  Information measure for performance of image fusion , 2002 .

[38]  Han Xu,et al.  GANMcC: A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion , 2021, IEEE Transactions on Instrumentation and Measurement.

[39]  Xin Fan,et al.  SMoA: Searching a Modality-Oriented Architecture for Infrared and Visible Image Fusion , 2021, IEEE Signal Processing Letters.

[40]  Kishore Rajendiran,et al.  Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications , 2018 .

[41]  J. Wesley Roberts,et al.  Assessment of image fusion procedures using entropy, image quality, and multispectral classification , 2008 .