Deformable kernel networks for guided depth map upsampling

We address the problem of upsampling a low-resolution (LR) depth map using a registered high-resolution (HR) color image of the same scene. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details from LR depth and HR color images, and regress upsampling results directly from the networks. In this paper, we revisit the weighted averaging process that has been widely used to transfer structural details from hand-crafted visual features to LR depth maps. We instead learn explicitly sparse and spatially-variant kernels for this task. To this end, we propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sparse sets of neighbors and the corresponding weights adaptively for each pixel. We also propose a fast version of DKN (FDKN) that runs about 17 times faster (0.01 seconds for a HR image of size 640 x 480). Experimental results on standard benchmarks demonstrate the effectiveness of our approach. In particular, we show that the weighted averaging process with 3 x 3 kernels (i.e., aggregating 9 samples sparsely chosen) outperforms the state of the art by a significant margin.

[1]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jean Ponce,et al.  Robust Guided Image Filtering Using Nonconvex Potentials , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[5]  Kaiqi Huang,et al.  Fast End-to-End Trainable Guided Filter , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Qi Zhang,et al.  Rolling Guidance Filter , 2014, ECCV.

[7]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[9]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[12]  Li Xu,et al.  Structure extraction from texture via relative total variation , 2012, ACM Trans. Graph..

[13]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[14]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[15]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[16]  Narendra Ahuja,et al.  Joint Image Filtering with Deep Convolutional Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[19]  Xiaopeng Zhang,et al.  Cross-Field Joint Image Restoration via Scale Map , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[21]  Aykut Erdem,et al.  Structure-preserving image smoothing via region covariances , 2013, ACM Trans. Graph..

[22]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Feng Liu,et al.  Depth Enhancement via Low-Rank Matrix Completion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.