Single Image Depth Prediction with Wavelet Decomposition

We present a novel method for predicting accurate depths from monocular images with high efficiency. This optimal efficiency is achieved by exploiting wavelet decomposition, which is integrated in a fully differentiable encoder-decoder architecture. We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients.In contrast with previous works, we show that wavelet coefficients can be learned without direct supervision on coefficients. Instead we supervise only the final depth image that is reconstructed through the inverse wavelet transform. We additionally show that wavelet coefficients can be learned in fully self-supervised scenarios, without access to ground-truth depth. Finally, we apply our method to different state-of-the-art monocular depth estimation models, in each case giving similar or better results compared to the original model, while requiring less than half the multiply-adds in the decoder network.

[1]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[4]  Vincent Lepetit,et al.  SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  Chunhua Shen,et al.  Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video , 2019, NeurIPS.

[6]  Gabriel J. Brostow,et al.  Self-Supervised Monocular Depth Hints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Vishal Monga,et al.  Deep Wavelet Prediction for Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[11]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wei Xu,et al.  Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Peter Wonka,et al.  High Quality Monocular Depth Estimation via Transfer Learning , 2018, ArXiv.

[18]  Michael O. Polley,et al.  Wavelet Synthesis Net for Disparity Estimation to Synthesize DSLR Calibre Bokeh Effect on Smartphones , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Chunhua Shen,et al.  Structured Knowledge Distillation for Dense Prediction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[21]  Suchendra M. Bhandarkar,et al.  DepthNet: A Recurrent Neural Network Architecture for Monocular Depth Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Jaejun Yoo,et al.  Deep Convolutional Framelet Denosing for Low-Dose CT via Wavelet Residual Network , 2018, IEEE Transactions on Medical Imaging.

[23]  Johannes Kopf,et al.  Fast depth densification for occlusion-aware augmented reality , 2018, ACM Trans. Graph..

[24]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[25]  Stefano Mattoccia,et al.  Towards Real-Time Unsupervised Monocular Depth Estimation on CPU , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[27]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[28]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Mai Xu,et al.  Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[33]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[34]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[35]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Shanxin Yuan,et al.  Wavelet-Based Dual-Branch Network for Image Demoireing , 2020, ECCV.

[37]  Sertac Karaman,et al.  FastDepth: Fast Monocular Depth Estimation on Embedded Systems , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Fergal Cotter,et al.  Uses of complex wavelets in deep convolutional neural networks , 2020 .

[39]  Friedrich Fraundorfer,et al.  Comparison of monocular depth estimation methods using geometrically relevant metrics on the IBims-1 dataset , 2020, Comput. Vis. Image Underst..

[40]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[41]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[42]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kaiming He,et al.  PointRend: Image Segmentation As Rendering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[45]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Yanyun Qu,et al.  Deep Wavelet Network with Domain Adaptation for Single Image Demoireing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Stefano Mattoccia,et al.  Real-Time Single Image Depth Perception in the Wild with Handheld Devices , 2020, Sensors.

[48]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[53]  Vincent Lepetit,et al.  Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Dani Lischinski,et al.  Colorization using optimization , 2004, ACM Trans. Graph..

[55]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[56]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[57]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[58]  Friedrich Fraundorfer,et al.  Evaluation of CNN-based Single-Image Depth Estimation Methods , 2018, ECCV Workshops.

[59]  Thierry Blu,et al.  Mathematical properties of the JPEG2000 wavelet filters , 2003, IEEE Trans. Image Process..

[60]  Tom Drummond,et al.  CReaM: Condensed Real-time Models for Depth Prediction using Convolutional Neural Networks , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[61]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Kilian Q. Weinberger,et al.  Convolutional Networks with Dense Connectivity , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[64]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[65]  Tieniu Tan,et al.  Wavelet-SRNet: A Wavelet-Based CNN for Multi-scale Face Super Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[66]  Wei Li,et al.  WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Zhihui Lai,et al.  Wavelet Integrated CNNs for Noise-Robust Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).