Learned Dynamic Guidance for Depth Image Reconstruction

The depth images acquired by consumer depth sensors (e.g., Kinect and ToF) usually are of low resolution and insufficient quality. One natural solution is to incorporate a high resolution RGB camera and exploit the statistical correlation of its data and depth. In recent years, both optimization-based and learning-based approaches have been proposed to deal with the guided depth reconstruction problems. In this paper, we introduce a weighted analysis sparse representation (WASR) model for guided depth image enhancement, which can be considered a generalized formulation of a wide range of previous optimization-based models. We unfold the optimization by the WASR model and conduct guided depth reconstruction with dynamically changed stage-wise operations. Such a guidance strategy enables us to dynamically adjust the stage-wise operations that update the depth image, thus improving the reconstruction quality and speed. To learn the stage-wise operations in a task-driven manner, we propose two parameterizations and their corresponding methods: dynamic guidance with Gaussian RBF nonlinearity parameterization (DG-RBF) and dynamic guidance with CNN nonlinearity parameterization (DG-CNN). The network structures of the proposed DG-RBF and DG-CNN methods are designed with the the objective function of our WASR model in mind and the optimal network parameters are learned from paired training data. Such optimization-inspired network architectures enable our models to leverage the previous expertise as well as take benefit from training data. The effectiveness is validated for guided depth image super-resolution and for realistic depth image reconstruction tasks using standard benchmarks. Our DG-RBF and DG-CNN methods achieve the best quantitative results (RMSE) and better visual quality than the state-of-the-art approaches at the time of writing. The code is available at https://github.com/ShuhangGu/GuidedDepthSR.

[1]  Stefan Roth,et al.  Shrinkage Fields for Effective Image Restoration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Guy Gilboa,et al.  Nonlocal Operators with Applications to Image Processing , 2008, Multiscale Model. Simul..

[4]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Luc Van Gool,et al.  Dense 3D Regression for Hand Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jean Ponce,et al.  Robust image filtering using joint static and dynamic guidance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[10]  Lei Zhang,et al.  Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[13]  Yao Wang,et al.  Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model , 2014, IEEE Transactions on Image Processing.

[14]  Sebastian Thrun,et al.  A Noise‐aware Filter for Real‐time Depth Upsampling , 2008 .

[15]  Hrvoje Benko,et al.  Hybrid HFR Depth: Fusing Commodity Depth and Color Cameras to Achieve High Frame Rate, Low Latency Depth Camera Interactions , 2017, CHI.

[16]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision , 2016, International Journal of Computer Vision.

[17]  Emmanuel J. Candès,et al.  The curvelet transform for image denoising , 2002, IEEE Trans. Image Process..

[18]  Chih-Yuan Yang,et al.  Single-Image Super-Resolution: A Benchmark , 2014, ECCV.

[19]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[20]  Michael Elad,et al.  Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model , 2013, IEEE Transactions on Signal Processing.

[21]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[22]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[23]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[24]  Thomas Pock,et al.  Variational Networks: Connecting Variational Methods and Deep Learning , 2017, GCPR.

[25]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[26]  Feng Liu,et al.  Depth Enhancement via Low-Rank Matrix Completion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Song-Chun Zhu Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998 .

[28]  Luc Van Gool,et al.  Seven Ways to Improve Example-Based Single Image Super Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[31]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  S. Osher,et al.  Coordinate descent optimization for l 1 minimization with application to compressed sensing; a greedy algorithm , 2009 .

[33]  Horst Bischof,et al.  A Deep Primal-Dual Network for Guided Depth Super-Resolution , 2016, BMVC.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Michael Elad,et al.  Analysis versus synthesis in signal priors , 2006, 2006 14th European Signal Processing Conference.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[38]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[39]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[40]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[41]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[42]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[43]  Jian Sun,et al.  Deep ADMM-Net for Compressive Sensing MRI , 2016, NIPS.

[44]  Chongyu Chen,et al.  Learning Dynamic Guidance for Depth Image Enhancement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[47]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Martin Kleinsteuber,et al.  A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Yunjin Chen,et al.  Insights Into Analysis Operator Learning: From Patch-Based Sparse Models to Higher Order MRFs , 2014, IEEE Transactions on Image Processing.

[50]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[51]  Stephen Lin,et al.  Data-driven depth map refinement via multi-scale sparse representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[53]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Li Xu,et al.  Mutual-Structure for Joint Filtering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Yi He,et al.  On the Convergence of Learning-Based Iterative Methods for Nonconvex Inverse Problems , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[58]  Wei Yu,et al.  On learning optimized reaction diffusion processes for effective image restoration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Marc Teboulle,et al.  A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[60]  Lei Zhang,et al.  Discriminative learning of iteration-wise priors for blind deconvolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[62]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[63]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization with Application to Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Ruigang Yang,et al.  Reliability Fusion of Time-of-Flight Depth and Stereo Geometry for High Quality Depth Maps , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.