论文信息 - Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

Monocular depth estimation is a foundation task of three-dimensional (3D) reconstruction which is used to improve the accuracy of environment perception. Because of the simpler hardware requirement, it is more suitable than other multi-view methods. In this study, a new monocular depth estimation algorithm based on graph convolution network (GCN) is proposed. The pixel-wise depth relationship is introduced into conventional convolution neural network (CNN) to make up the disadvantage of processing non-Euclidian data. And the remaining depth topological graph information on the spatial latent variables are extracted based on a multi-scale reconstruction strategy. The final results on NYU-v2 depth dataset and KITTI depth dataset demonstrate that our algorithm improves the quality of monocular depth estimation, especially there are several little objects coexisting in the scenes.

Jun Liang | Junwei Fu | Ziyang Wang

[1] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2] Sébastien Ourselin,et al. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[3] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[5] Bo Dai,et al. Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] David Mumford,et al. Statistics of range images , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10] Shunli Zhang,et al. Monocular depth estimation with guidance of surface normal map , 2017, Neurocomputing.

[11] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[14] Chunhua Shen,et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17] Bo Li,et al. Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference , 2017, Pattern Recognit..

[18] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[19] Ming Xu,et al. Channel-Max, Channel-Drop and Stochastic Max-pooling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20] Wei-Xing Wang,et al. Automatic Depth Map Estimation of Monocular Indoor Environments , 2008, 2008 International Conference on MultiMedia and Information Technology.

[21] Zhuowen Tu,et al. Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Nicu Sebe,et al. Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Hai Tao,et al. Review of deep convolution neural network in image classification , 2017, 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET).

[24] Xaq Pitkow,et al. Skip Connections Eliminate Singularities , 2017, ICLR.

[25] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Lin Zhang,et al. Super-Resolution for Monocular Depth Estimation With Multi-Scale Sub-Pixel Convolutions and a Smoothness Constraint , 2019, IEEE Access.

[27] F. Scarselli,et al. A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[28] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Zhiguo Cao,et al. Deep attention-based classification network for robust depth prediction , 2018, ACCV.

[30] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[31] Philip H. S. Torr,et al. Coarse-to-fine Planar Regularization for Dense Monocular Depth Estimation , 2016, ECCV.

[32] Palash Goyal,et al. Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[33] Patrick Flandrin,et al. A complete ensemble empirical mode decomposition with adaptive noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34] Nicu Sebe,et al. Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Toby P. Breckon,et al. Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Max Welling,et al. Variational Graph Auto-Encoders , 2016, ArXiv.

[37] Xavier Bresson,et al. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[38] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[39] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[40] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[41] Ian D. Reid,et al. Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).