Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

Self-supervised cross-modal super-resolution (SR) can overcome the difficulty of acquiring paired training data, but is challenging because only low-resolution (LR) source and high-resolution (HR) guide images from different modalities are available. Existing methods utilize pseudo or weak supervision in LR space and thus deliver results that are blurry or not faithful to the source modality. To address this issue, we present a mutual modulation SR (MMSR) model, which tackles the task by a mutual modulation strategy, including a source-to-guide modulation and a guide-to-source modulation. In these modulations, we develop cross-domain adaptive filters to fully exploit cross-modal spatial dependency and help induce the source to emulate the resolution of the guide and induce the guide to mimic the modality characteristics of the source. Moreover, we adopt a cycle consistency constraint to train MMSR in a fully self-supervised manner. Experiments on various tasks demonstrate the state-of-the-art performance of our MMSR.

[1]  Yulan Guo,et al.  Exploring Fine-Grained Sparsity in Convolutional Neural Networks for Efficient Inference , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  J. D. Wegner,et al.  Learning Graph Regularisation for Guided Super-Resolution , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhengyu Liang,et al.  Occlusion-Aware Cost Constructor for Light Field Depth Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jingyi Yu,et al.  Disentangling Light Fields for Super-Resolution and Disparity Estimation , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Enhua Wu,et al.  Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jimmy S. J. Ren,et al.  Learning Spatially Variant Linear Representation Models for Joint Filtering , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Gang Zeng,et al.  Joint Implicit Image Function for Guided Depth Super-Resolution , 2021, ACM Multimedia.

[8]  Yun Fu,et al.  MR Image Super-Resolution with Squeeze and Excitation Reasoning Attention Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Bohyung Han,et al.  CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yuchen Fan,et al.  Image Super-Resolution with Non-Local Sparse Attention , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ming-Ming Cheng,et al.  Temporal Modulation Network for Controllable Space-Time Video Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chunjie Zhang,et al.  Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei An,et al.  Unsupervised Degradation Representation Learning for Blind Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kwanghoon Sohn,et al.  Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xintao Wang,et al.  Towards Real-World Blind Face Restoration with Generative Facial Prior , 2021, Computer Vision and Pattern Recognition.

[16]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Naoto Yokoya,et al.  Large-Scale Semantic 3-D Reconstruction: Outcome of the 2019 IEEE GRSS Data Fusion Contest—Part A , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[18]  Dov Danon,et al.  Single Pair Cross-Modality Super Resolution , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Naoto Yokoya,et al.  Learning from Multimodal and Multitemporal Earth Observation Data for Building Damage Mapping , 2020, ISPRS Journal of Photogrammetry and Remote Sensing.

[20]  Luc Van Gool,et al.  Plug-and-Play Image Restoration With Deep Denoiser Prior , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Naoto Yokoya,et al.  Guided Deep Decoder: Unsupervised Image Pair Fusion , 2020, ECCV.

[22]  Thomas S. Huang,et al.  Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  In So Kweon,et al.  Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Changxin Gao,et al.  Domain Adaptation for Image Dehazing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Lu Yuan,et al.  Cross-Domain Correspondence Learning for Exemplar-Based Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Daniel Cohen-Or,et al.  Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shunta Maeda,et al.  Unpaired Image Super-Resolution Using Pseudo-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Radu Timofte,et al.  Learned Dynamic Guidance for Depth Image Reconstruction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jean Ponce,et al.  Deformable Kernel Networks for Joint Image Filtering , 2019, International Journal of Computer Vision.

[30]  Jinhui Tang,et al.  Spatially Variant Linear Representation Models for Joint Filtering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yu Qiao,et al.  Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jan Dirk Wegner,et al.  Guided Super-Resolution As Pixel-to-Pixel Transformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Wei Wang,et al.  CFSNet: Toward a Controllable Feature Space for Image Restoration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[35]  Chenliang Xu,et al.  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Olivier Debeir,et al.  Multimodal Sensor Fusion In Single Thermal image Super-Resolution , 2018, ACCV Workshops.

[37]  Xiaoou Tang,et al.  Deep Network Interpolation for Continuous Imagery Effect Transition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[39]  Thomas S. Huang,et al.  Non-Local Recurrent Network for Image Restoration , 2018, NeurIPS.

[40]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Narendra Ahuja,et al.  Joint Image Filtering with Deep Convolutional Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Chongyu Chen,et al.  Learning Dynamic Guidance for Depth Image Enhancement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[45]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[46]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  David A. Forsyth,et al.  Sparse depth super resolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Stephen Lin,et al.  Data-driven depth map refinement via multi-scale sparse representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yong Jae Lee,et al.  FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Aleksandra Pizurica,et al.  Processing of Multiresolution Thermal Hyperspectral and Digital Color Data: Outcome of the 2014 IEEE GRSS Data Fusion Contest , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[53]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Ming-Yu Liu,et al.  Joint Geodesic Upsampling of Depth Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[56]  Lifeng Sun,et al.  Joint Example-Based Depth Map Super-Resolution , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[57]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[59]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[61]  Jiri Matas,et al.  Forward-Backward Error: Automatic Detection of Tracking Failures , 2010, 2010 20th International Conference on Pattern Recognition.

[62]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[63]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[66]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[67]  Jian Sun,et al.  Poisson matting , 2004, ACM Trans. Graph..

[68]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[69]  Stanley Osher,et al.  Total variation based image restoration with free local constraints , 1994, Proceedings of 1st International Conference on Image Processing.

[70]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .