Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

Depth maps captured with commodity sensors are often of low quality and resolution; these maps need to be enhanced to be used in many applications. State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of lowand high-resolution depth maps of the same scenes. Acquisition of real-world paired data requires specialized setups. Another alternative, generating low-resolution maps from high-resolution maps by subsampling, adding noise and other artificial degradation methods, does not fully capture the characteristics of real-world low-resolution images. As a consequence, supervised learning methods trained on such artificial paired data may not perform well on real-world low-resolution inputs. We consider an approach to depth map enhancement based on learning from unpaired data. While many techniques for unpaired image-to-image translation have been proposed, most are not directly applicable to depth maps. We propose an unpaired learning method for simultaneous depth enhancement and super-resolution, which is based on a learnable degradation model and surface normal estimates as features to produce more accurate depth maps. We demonstrate that our method outperforms existing unpaired methods and performs on par with paired methods on a new benchmark for unpaired learning that we developed.

[1]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[2]  Thomas H. Li,et al.  StructureFlow: Image Inpainting via Structure-Aware Appearance Flow , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Zhan Xu,et al.  Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[6]  Dmitry Ulyanov,et al.  Self-supervised Depth Denoising Using Lower- and Higher-quality RGB-D sensors , 2020, 2020 International Conference on 3D Vision (3DV).

[7]  Minjae Kim,et al.  U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation , 2019, ICLR.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[10]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[11]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[13]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[14]  Jing Yang,et al.  To learn image super-resolution, use a GAN to learn how to do image degradation first , 2018, ECCV.

[15]  Stamatios Lefkimmiatis,et al.  Non-local Color Image Denoising with Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[18]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[20]  Wei Wu,et al.  Feedback Network for Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bin Fang,et al.  Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zitian Zhang,et al.  Multi-Scale Progressive Fusion Learning for Depth Map Super-Resolution , 2020, ArXiv.

[23]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Yao Guo,et al.  Coupled Real-Synthetic Domain Adaptation for Real-World Deep Depth Enhancement , 2020, IEEE Transactions on Image Processing.

[25]  Jianfei Cai,et al.  T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks , 2018, ECCV.

[26]  Shunta Maeda,et al.  Unpaired Image Super-Resolution Using Pseudo-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wangmeng Zuo,et al.  Toward Convolutional Blind Denoising of Real Photographs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[29]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[30]  Stamatios Lefkimmiatis,et al.  Universal Denoising Networks : A Novel CNN Architecture for Image Denoising , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[32]  Gregory Shakhnarovich,et al.  Deep Back-Projection Networks for Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[35]  Jie Zhou,et al.  Structure-Preserving Super Resolution With Gradient Guidance , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[37]  Baining Guo,et al.  Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[39]  Petros Daras,et al.  Self-Supervised Deep Depth Denoising , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Horst Bischof,et al.  A Deep Primal-Dual Network for Guided Depth Super-Resolution , 2016, BMVC.

[42]  Vladlen Koltun,et al.  Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Evgeny Burnaev,et al.  Perceptual Deep Depth Super-Resolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Gwanggil Jeon,et al.  Joint-adaptive bilateral depth map upsampling , 2014, Signal Process. Image Commun..

[45]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[46]  Ruigang Yang,et al.  Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Wei Wang,et al.  Deep Learning for Single Image Super-Resolution: A Brief Review , 2018, IEEE Transactions on Multimedia.

[48]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[49]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[50]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[51]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[52]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[53]  Seungyong Lee,et al.  Reconstruction-Based Pairwise Depth Dataset for Depth Image Enhancement Using CNN , 2018, ECCV.

[54]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.