Depth information calculation method for unstructured objects based on deep neural network

Depth information perception of unstructured scene images is an important problem for applications using computer vision. This paper proposes a method based on deep learning combined with self-attention mechanism to reason the depth information of unstructured indoor targets, which effectively solves the problem of blurred image detail and insufficient layering in depth information reasoning in unstructured scenes. First, the deep learning-based encoder-decoder model is trained to learn the depth information of indoor scenes on large 3D datasets. The trained model has good results for general structured indoor scenes. Secondly, the soft self-attention mechanism is used to obtain the disparity information between the upper and lower sequences of the input image, by which the depth map obtained in the first step is corrected to enhance the accuracy of depth. Finally, in order to get clear objects with obvious boundaries in the depth response map, the nearest neighbor regression is used to correct the contour of the objects. The experimental results show that the proposed method has very good depth information reasoning ability for indoor unstructured scenes. Through depth information reasoning, the obtained objects have obvious texture structure, strong geometric features, clear contour edges and delicate layers, and also the misleading of deep information reasoning in reflective and highlight areas is eliminated.

[1]  Bernhard Schölkopf,et al.  Pattern Recognition , 2004, Lecture Notes in Computer Science.

[2]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Steven M. Seitz,et al.  Depth from focus with your mobile phone , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Carl E. Rasmussen,et al.  Learning Depth from Stereo , 2004, DAGM-Symposium.

[6]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[7]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[10]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[11]  Jason Maassen,et al.  structure-from-motion: 1.0.0 , 2016 .

[12]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  R. Memisevic,et al.  Stereopsis via deep learning , 2013 .

[15]  Pengfei Wang,et al.  Left-Right Comparative Recurrent Model for Stereo Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.