Data-driven depth map refinement via multi-scale sparse representation

Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantization. In this paper, we present a data-driven approach for refining degraded RAWdepth maps that are coupled with an RGB image. The key idea of our approach is to take advantage of a training set of high-quality depth data and transfer its information to the RAW depth map through multi-scale dictionary learning. Utilizing a sparse representation, our method learns a dictionary of geometric primitives which captures the correlation between high-quality mesh data, RAW depth maps and RGB images. The dictionary is learned and applied in a manner that accounts for various practical issues that arise in dictionary-based depth refinement. Compared to previous approaches that only utilize the correlation between RAW depth maps and RGB images, our method produces improved depth maps without over-smoothing. Since our approach is data driven, the refinement can be targeted to a specific class of objects by employing a corresponding training set. In our experiments, we show that this leads to additional improvements in recovering depth maps of human faces.

[1]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[2]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[3]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[4]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[5]  Paolo Favaro,et al.  Recovering thin structures via nonlocal-means regularization with application to depth from defocus , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[7]  Carsten Rother,et al.  Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation , 2013, NIPS.

[8]  Stephen Lin,et al.  Shading-Based Shape Refinement of RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pushmeet Kohli,et al.  When Can We Use KinectFusion for Ground Truth Acquisition , 2012 .

[10]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[11]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[13]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ming-Yu Liu,et al.  Joint Geodesic Upsampling of Depth Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Martin Kleinsteuber,et al.  A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Rama Chellappa,et al.  Cross-View Action Recognition via a Transferable Dictionary Pair , 2012, BMVC.

[18]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[19]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[20]  In-So Kweon,et al.  High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[23]  Michael S. Brown,et al.  High-Quality Depth Map Upsampling and Completion for RGB-D Cameras , 2014, IEEE Transactions on Image Processing.

[24]  Cewu Lu,et al.  Image smoothing via L0 gradient minimization , 2011, ACM Trans. Graph..

[25]  Ivana Tosic,et al.  Learning Joint Intensity-Depth Sparse Representations , 2012, IEEE Transactions on Image Processing.

[26]  Kun Li,et al.  Depth Recovery Using an Adaptive Color-Guided Auto-Regressive Model , 2012, ECCV.

[27]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[28]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[29]  Lifeng Sun,et al.  Joint Example-Based Depth Map Super-Resolution , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[30]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[31]  Harry Shum,et al.  Image completion with structure propagation , 2005, ACM Trans. Graph..