PMBANet: Progressive Multi-Branch Aggregation Network for Scene Depth Super-Resolution

Depth map super-resolution is an ill-posed inverse problem with many challenges. First, depth boundaries are generally hard to reconstruct particularly at large magnification factors. Second, depth regions on fine structures and tiny objects in the scene are destroyed seriously by downsampling degradation. To tackle these difficulties, we propose a progressive multi-branch aggregation network (PMBANet), which consists of stacked MBA blocks to fully address the above problems and progressively recover the degraded depth map. Specifically, each MBA block has multiple parallel branches: 1) The reconstruction branch is proposed based on the designed attention-based error feed-forward/-back modules, which iteratively exploits and compensates the downsampling errors to refine the depth map by imposing the attention mechanism on the module to gradually highlight the informative features at depth boundaries. 2) We formulate a separate guidance branch as prior knowledge to help to recover the depth details, in which the multi-scale branch is to learn a multi-scale representation that pays close attention at objects of different scales, while the color branch regularizes the depth map by using auxiliary color information. Then, a fusion block is introduced to adaptively fuse and select the discriminative features from all the branches. The design methodology of our whole network is well-founded, and extensive experiments on benchmark datasets demonstrate that our method achieves superior performance in comparison with the state-of-the-art methods. Our code and models are available at https://github.com/Sunbaoli/PMBANet_DSR/.

[1]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Hang Su,et al.  Pixel-Adaptive Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[5]  Thomas S. Huang,et al.  Image Super-Resolution via Dual-State Recurrent Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Tao Dai,et al.  Image Super-Resolution via Residual Block Attention Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[9]  Shu-Tao Xia,et al.  Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jie Li,et al.  Channel-Wise and Spatial Feature Modulation Network for Single Image Super-Resolution , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Rogério Schmidt Feris,et al.  Edge guided single depth image super resolution , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[12]  Jan Dirk Wegner,et al.  Guided Super-Resolution As Pixel-to-Pixel Transformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[14]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ming-Yu Liu,et al.  Joint Geodesic Upsampling of Depth Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Qingming Huang,et al.  An Iterative Co-Saliency Framework for RGBD Images , 2017, IEEE Transactions on Cybernetics.

[17]  Ping Li,et al.  Deep Color Guided Coarse-to-Fine Convolutional Network Cascade for Depth Image Super-Resolution , 2019, IEEE Transactions on Image Processing.

[18]  Kwanghoon Sohn,et al.  Structure Selective Depth Superresolution for RGB-D Cameras , 2016, IEEE Transactions on Image Processing.

[19]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[20]  Gregory Shakhnarovich,et al.  Deep Back-Projection Networks for Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Rogério Schmidt Feris,et al.  Single depth image super resolution and denoising via coupled dictionary learning with local constraints and shock filtering , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[23]  Jia-Bin Huang,et al.  Guided Image-to-Image Translation With Bi-Directional Feature Transformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[26]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Chongyu Chen,et al.  Learning Dynamic Guidance for Depth Image Enhancement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[30]  Jinhui Tang,et al.  Spatially Variant Linear Representation Models for Joint Filtering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[34]  Yao Wang,et al.  Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model , 2014, IEEE Transactions on Image Processing.

[35]  Stephen Grossberg,et al.  Stereo boundary fusion by cortical complex cells: A system of maps, filters, and feedback networks for multiplexing distributed data , 1989, Neural Networks.

[36]  Lin Sun,et al.  Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Stanley H. Chan,et al.  Plug-and-Play ADMM for Image Restoration: Fixed-Point Convergence and Applications , 2016, IEEE Transactions on Computational Imaging.

[38]  Jean Ponce,et al.  Deformable kernel networks for guided depth map upsampling , 2019, ArXiv.

[39]  Qingming Huang,et al.  Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation , 2017, IEEE Transactions on Image Processing.

[40]  Masanori Suganuma,et al.  Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Qingming Huang,et al.  HSCS: Hierarchical Sparsity Based Co-saliency Detection for RGBD Images , 2018, IEEE Transactions on Multimedia.

[42]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[43]  Gang Yu,et al.  Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Mei Han,et al.  Bilateral Back-Projection for Single Image Super Resolution , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[45]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[46]  Qingming Huang,et al.  Going From RGB to RGBD Saliency: A Depth-Guided Transformation Model , 2020, IEEE Transactions on Cybernetics.

[47]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Wei Wu,et al.  Feedback Network for Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Radu Timofte,et al.  Learned Dynamic Guidance for Depth Image Reconstruction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Haojie Li,et al.  Depth Super-Resolution with Deep Edge-Inference Network and Edge-Guided Depth Filling , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Jean Ponce,et al.  Robust Guided Image Filtering Using Nonconvex Potentials , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Nicu Sebe,et al.  Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Gabriel J. Brostow,et al.  Patch Based Synthesis for Single Depth Image Super-Resolution , 2012, ECCV.

[56]  Michal Irani,et al.  Improving resolution by image registration , 1991, CVGIP Graph. Model. Image Process..

[57]  Minh N. Do,et al.  Cross-based local multipoint filtering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Qingming Huang,et al.  Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion , 2016, IEEE Signal Processing Letters.

[59]  Jing-Yu Yang,et al.  Depth recovery via decomposition of polynomial and piece-wise constant signals , 2016, 2016 Visual Communications and Image Processing (VCIP).

[60]  Thomas S. Huang,et al.  Deep Networks for Image Super-Resolution with Sparse Prior , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Huazhu Fu,et al.  Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution , 2019, IEEE Transactions on Image Processing.

[62]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[63]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[64]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.