Semantic-Aware Depth Super-Resolution in Outdoor Scenes

While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors' limitations. By contrast, here, we introduce an approach to performing depth super-resolution in more challenging conditions, such as in outdoor scenes. To this end, we first propose to exploit semantic information to better constrain the super-resolution process. In particular, we design a co-sparse analysis model that learns filters from joint intensity, depth and semantic information. Furthermore, we show how low-resolution training depth maps can be employed in our learning strategy. We demonstrate the benefits of our approach over state-of-the-art depth super-resolution methods on two outdoor scene datasets.

[1]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Yunjin Chen,et al.  Insights Into Analysis Operator Learning: From Patch-Based Sparse Models to Higher Order MRFs , 2014, IEEE Transactions on Image Processing.

[3]  Ju Shen,et al.  Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[6]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[7]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Martin Kleinsteuber,et al.  A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Feng Liu,et al.  Depth Enhancement via Low-Rank Matrix Completion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[13]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[14]  Ruigang Yang,et al.  Stereoscopic inpainting: Joint color and depth completion from stereo images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[16]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[17]  Sebastian Thrun,et al.  LidarBoost: Depth superresolution for ToF 3D shape scanning , 2009, CVPR.

[18]  Nassir Navab,et al.  Stereo time-of-flight , 2011, 2011 International Conference on Computer Vision.

[19]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Stephen Gould DARWIN: a framework for machine learning and computer vision research and development , 2012, J. Mach. Learn. Res..

[21]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[22]  Gabriel J. Brostow,et al.  Patch Based Synthesis for Single Depth Image Super-Resolution , 2012, ECCV.

[23]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[24]  Stephen Lin,et al.  Shading-Based Shape Refinement of RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Klaus Diepold,et al.  Analysis Operator Learning and its Application to Image Reconstruction , 2012, IEEE Transactions on Image Processing.

[26]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Xiaojin Gong,et al.  Guided Depth Upsampling via a Cosparse Analysis Model , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Guangming Shi,et al.  Structure guided fusion for depth map inpainting , 2013, Pattern Recognit. Lett..

[29]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Carsten Rother,et al.  Depth Super Resolution by Rigid Body Self-Similarity in 3D , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.