Panoramic Depth Estimation via Supervised and Unsupervised Learning in Indoor Scenes

Depth estimation, as a necessary clue to convert 2D images into the 3D space, has been applied in many machine vision areas. However, to achieve an entire surrounding 360° geometric sensing, traditional stereo matching algorithms for depth estimation are limited due to large noise, low accuracy, and strict requirements for multi-camera calibration. In this work, for a unified surrounding perception, we introduce panoramic images to obtain a larger field of view. We extend PADENet [IEEE 23rd International Conference on Intelligent Transportation Systems, (2020), pp. 1-610.1109/ITSC45102.2020.9294206], which first appeared in our previous conference work for outdoor scene understanding, to perform panoramic monocular depth estimation with a focus for indoor scenes. At the same time, we improve the training process of the neural network adapted to the characteristics of panoramic images. In addition, we fuse the traditional stereo matching algorithm with deep learning methods and further improve the accuracy of depth predictions. With a comprehensive variety of experiments, this research demonstrates the effectiveness of our schemes aiming for indoor scene perception.

[1]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Ruigang Yang,et al.  Omnidirectional Depth Extension Networks , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Nassir Navab,et al.  Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images , 2018, ECCV.

[4]  KAITE XIANG,et al.  Polarization-driven Semantic Segmentation via Efficient Attention-bridged Fusion , 2020, Optics express.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[7]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Guido C. H. E. de Croon,et al.  Fusion of Stereo and Still Monocular Depth Estimates in a Self-Supervised Learning Context , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Lei Sun,et al.  Real-Time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-Driving Images , 2020, IEEE Robotics and Automation Letters.

[12]  P. Daras,et al.  Pano3D: A Holistic Benchmark and a Solid Baseline for 360° Depth Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Dae-Ki Kang,et al.  A Novel Diminish Smooth L1 Loss Model with Generative Adversarial Network , 2020, IHCI.

[14]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[15]  Jian Bai,et al.  Reducing the minimum range of a RGB-depth sensor to aid navigation in visually impaired individuals. , 2018, Applied optics.

[16]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Rynson W. H. Lau,et al.  Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss , 2018, ECCV.

[18]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Chen Wang,et al.  Unconstrained self-calibration of stereo camera on visually impaired assistance devices. , 2019, Applied optics.

[20]  Shugong Xu,et al.  Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Boxin Shi,et al.  Panoramic Image Reflection Removal , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Petros Daras,et al.  OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas , 2018, ECCV.

[23]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[24]  Ardhendu Behera,et al.  Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation , 2020, ECCV.

[25]  Kailun Yang,et al.  PASS: Panoramic Annular Semantic Segmentation , 2020, IEEE Transactions on Intelligent Transportation Systems.

[26]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Fu-En Wang,et al.  BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Liang Lin,et al.  Single View Stereo Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[31]  Kaiwei Wang,et al.  PADENet: An Efficient and Robust Panoramic Monocular Depth Estimation Network for Outdoor Scenes , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[32]  Rainer Stiefelhagen,et al.  Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[33]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Eduardo Romera,et al.  Robustifying semantic cognition of traversability across wearable RGB-depth cameras. , 2019, Applied optics.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Min Sun,et al.  HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Toby P. Breckon,et al.  Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery , 2018, ECCV.

[38]  Xinxin Hu,et al.  ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[39]  Hualie Jiang,et al.  UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation , 2021, IEEE Robotics and Automation Letters.

[40]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[41]  Petros Daras,et al.  Spherical View Synthesis for Self-Supervised 360° Depth Estimation , 2019, 2019 International Conference on 3D Vision (3DV).

[42]  Rainer Stiefelhagen,et al.  Capturing Omni-Range Context for Omnidirectional Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.