Omnidirectional Depth Extension Networks

Omnidirectional 360° camera proliferates rapidly for autonomous robots since it significantly enhances the perception ability by widening the field of view (FoV). However, corresponding 360° depth sensors, which are also critical for the perception system, are still difficult or expensive to have. In this paper, we propose a low-cost 3D sensing system that combines an omnidirectional camera with a calibrated projective depth camera, where the depth from the limited FoV can be automatically extended to the rest of recorded omnidirectional image. To accurately recover the missing depths, we design an omnidirectional depth extension convolutional neural network (ODE-CNN), in which a spherical feature transform layer (SFTL) is embedded at the end of feature encoding layers, and a deformable convolutional spatial propagation network (D-CSPN) is appended at the end of feature decoding layers. The former re-samples the neighborhood of each pixel in the omnidirectional coordination to the projective coordination, which reduce the difficulty of feature learning, and the later automatically finds a proper context to well align the structures in the estimated depths via CNN w.r.t. the reference image, which significantly improves the visual quality. Finally, we demonstrate the effectiveness of proposed ODE-CNN over the popular 360D dataset, and show that ODE-CNN significantly outperforms (relatively 33% reduction in depth error) other state-of-the-art (SoTA) methods.

[1]  Ji Zhang,et al.  LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[2]  Petros Daras,et al.  OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas , 2018, ECCV.

[3]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  H. Macher,et al.  FIRST EXPERIENCES WITH KINECT V2 SENSOR FOR CLOSE RANGE 3D MODELLING , 2015 .

[5]  Pascal Frossard,et al.  Graph-Based Classification of Omnidirectional Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[6]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[8]  Nassir Navab,et al.  Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images , 2018, ECCV.

[9]  Wei Xu,et al.  LEGO: Learning Edge with Geometry all at Once by Watching Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[11]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[13]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[14]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Nicu Sebe,et al.  Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[17]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[18]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[21]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[22]  Ming Yang,et al.  Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras , 2018, IEEE Transactions on Intelligent Transportation Systems.

[23]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[24]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[25]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Rafael Monroy,et al.  SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..

[27]  Shipeng Li,et al.  Texture-assisted Kinect depth inpainting , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[28]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[30]  Junmo Kim,et al.  Active Convolution: Learning the Shape of Convolution for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dinesh Manocha,et al.  Getting Robots Unfrozen and Unlost in Dense Pedestrian Crowds , 2018, IEEE Robotics and Automation Letters.

[32]  Thomas Brox,et al.  Artistic Style Transfer for Videos and Spherical Images , 2017, International Journal of Computer Vision.

[33]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[34]  Yoshimitsu Aoki,et al.  Depth image enhancement using local tangent plane approximations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Xu Zhao,et al.  EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[39]  Alan L. Yuille,et al.  SURGE: Surface Regularized Geometry Estimation from a Single Image , 2016, NIPS.

[40]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[41]  Jan-Michael Frahm,et al.  Mapped Convolutions , 2019, ArXiv.

[42]  Toby P. Breckon,et al.  Real-Time Low-Cost Omni-Directional Stereo Vision via Bi-polar Spherical Cameras , 2018, ICIAR.

[43]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Kristen Grauman,et al.  Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Max Welling,et al.  Convolutional Networks for Spherical Signals , 2017, ArXiv.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).