DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs

Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest Iphone X. However, they still suffer from heavy noises which limit their applications. Although plenty of progresses have been made to reduce the noises and boost geometric details, due to the inherent illness and the real-time requirement, the problem is still far from been solved. We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy. The rendering equation is exploited in our network in an unsupervised manner. In detail, we impose an unsupervised loss based on the light transport to extract the high-frequency geometry. Experimental results indicate that our network achieves real-time single depth enhancement on various categories of scenes. Thanks to the well decoupling of the low and high frequency information in the cascaded network, we achieve superior performance over the state-of-the-art techniques.

[1]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, SIGGRAPH 2010.

[2]  Sergio Guadarrama,et al.  The Devil is in the Decoder , 2017, BMVC.

[3]  Peter V. Gehler,et al.  Reflectance Adaptive Filtering Improves Intrinsic Image Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sebastian Thrun,et al.  A Noise‐aware Filter for Real‐time Depth Upsampling , 2008 .

[5]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[6]  Hans-Peter Seidel,et al.  Shading-based dynamic shape refinement from multi-view video under general illumination , 2011, 2011 International Conference on Computer Vision.

[7]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stephen Lin,et al.  Shading-Based Shape Refinement of RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  StammingerMarc,et al.  Real-time shading-based refinement for consumer depth cameras , 2014 .

[12]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[13]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[15]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[16]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[17]  Ron Kimmel,et al.  Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[19]  Nazar Khan,et al.  Training many-parameter shape-from-shading models using a surface database , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[20]  In-So Kweon,et al.  High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Derek Bradley,et al.  Improved Reconstruction of Deforming Surfaces by Cancelling Ambient Occlusion , 2012, ECCV.

[22]  Berthold K. P. Horn Obtaining shape from shading information , 1989 .

[23]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[26]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  K. Hartmann,et al.  Data-Fusion of PMD-Based Distance-Information and High-Resolution RGB-Images , 2007, 2007 International Symposium on Signals, Circuits and Systems.

[29]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[30]  Qionghai Dai,et al.  DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Xiaoyang Liu,et al.  Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera , 2017, ACM Trans. Graph..

[32]  Christian Theobalt,et al.  On-set performance capture of multiple actors with a stereo camera , 2013, ACM Trans. Graph..

[33]  Tao Yu,et al.  BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Gerd Hirzinger,et al.  Learning shape from shading by a multilayer network , 1996, IEEE Trans. Neural Networks.

[35]  Jufu Feng,et al.  A Novel Facial Appearance Descriptor Based on Local Binary Pattern , 2008, 2008 Chinese Conference on Pattern Recognition.

[36]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[38]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[39]  Alfred M. Bruckstein,et al.  RGBD-fusion: Real-time high precision depth recovery , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[43]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  James T. Kajiya,et al.  The rendering equation , 1986, SIGGRAPH.

[47]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[48]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[49]  Didier Stricker,et al.  Algorithms for 3D Shape Scanning with a Depth Camera , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[51]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).