Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Depth maps obtained by commercial depth sensors are always in low-resolution, making it difficult to be used in various computer vision tasks. Thus, depth map super-resolution (SR) is a practical and valuable task, which up-scales the depth map into high-resolution (HR) space. However, limited by the lack of real-world paired low-resolution (LR) and HR depth maps, most existing methods use down-sampling to obtain paired training samples. To this end, we first construct a large-scale dataset named "RGB-D-D", which can greatly promote the study of depth map SR and even more depth-related real-world tasks. The "D-D" in our dataset represents the paired LR and HR depth maps captured from mobile phone and Lucid Helios respectively ranging from indoor scenes to challenging outdoor scenes. Besides, we provide a fast depth map super-resolution (FDSR) baseline, in which the high-frequency component adaptively decomposed from RGB image to guide the depth map SR. Extensive experiments on existing public datasets demonstrate the effectiveness and efficiency of our network compared with the state-of-the-art methods. Moreover, for the real-world LR depth maps, our algorithm can produce more accurate HR depth maps with clearer boundaries and to some extent correct the depth value errors.

[1]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[2]  Andrea F. Abate,et al.  2D and 3D face recognition: A survey , 2007, Pattern Recognit. Lett..

[3]  Dani Lischinski,et al.  Colorization using optimization , 2004, ACM Trans. Graph..

[4]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[5]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[6]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Jean Ponce,et al.  Deformable Kernel Networks for Joint Image Filtering , 2019, International Journal of Computer Vision.

[8]  Michael O. Polley,et al.  Wavelet Synthesis Net for Disparity Estimation to Synthesize DSLR Calibre Bokeh Effect on Smartphones , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stephen Lin,et al.  Data-driven depth map refinement via multi-scale sparse representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Il Hong Suh,et al.  From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation , 2019, ArXiv.

[12]  Hang Su,et al.  Pixel-Adaptive Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Markus H. Gross,et al.  StereoBrush: interactive 2D to 3D conversion using discontinuous warps , 2011, SBIM '11.

[14]  Atsuto Maki,et al.  Towards a simulation driven stereo vision system , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Marc Pollefeys,et al.  Turning Mobile Phones into 3D Scanners , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Marcel Worring,et al.  Watersnakes: Energy-Driven Watershed Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[19]  Huazhu Fu,et al.  Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution , 2019, IEEE Transactions on Image Processing.

[20]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[21]  Ping Li,et al.  Deep Color Guided Coarse-to-Fine Convolutional Network Cascade for Depth Image Super-Resolution , 2019, IEEE Transactions on Image Processing.

[22]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[23]  Lifeng Sun,et al.  Joint Example-Based Depth Map Super-Resolution , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[24]  Zhifeng Chen,et al.  Multi-Scale Frequency Reconstruction for Guided Depth Map Super-Resolution via Deep Residual Network , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Narendra Ahuja,et al.  Joint Image Filtering with Deep Convolutional Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[28]  Kun Li,et al.  Depth Recovery Using an Adaptive Color-Guided Auto-Regressive Model , 2012, ECCV.

[29]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[32]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[33]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Feng Liu,et al.  Depth Enhancement via Low-Rank Matrix Completion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  Pavlo Molchanov,et al.  Hand gesture recognition with 3D convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[38]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Shuicheng Yan,et al.  Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[42]  Jinhui Tang,et al.  Spatially Variant Linear Representation Models for Joint Filtering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.