Unsupervised Deep Learning-Based RGB-D Visual Odometry

Recently, deep learning frameworks have been deployed in visual odometry systems and achieved comparable results to traditional feature matching based systems. However, most deep learning-based frameworks inevitably need labeled data as ground truth for training. On the other hand, monocular odometry systems are incapable of restoring absolute scale. External or prior information has to be introduced for scale recovery. To solve these problems, we present a novel deep learning-based RGB-D visual odometry system. Our two main contributions are: (i) during network training and pose estimation, the depth images are fed into the network to form a dual-stream structure with the RGB images, and a dual-stream deep neural network is proposed. (ii) the system adopts an unsupervised end-to-end training method, thus the labor-intensive data labeling task is not required. We have tested our system on the KITTI dataset, and results show that the proposed RGB-D Visual Odometry (VO) system has obvious advantages over other state-of-the-art systems in terms of both translation and rotation errors.

[1]  Saqib Salahuddin,et al.  Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review , 2020, Sensors.

[2]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[3]  Dongbing Gu,et al.  Using Unsupervised Deep Learning Technique for Monocular Visual Odometry , 2019, IEEE Access.

[4]  Monica Carfagni,et al.  On the Performance of the Intel SR30 Depth Camera: Metrological and Critical Characterization , 2017, IEEE Sensors Journal.

[5]  Subhas Chandra Mukhopadhyay,et al.  Wearable Sensors for Human Activity Monitoring: A Review , 2015, IEEE Sensors Journal.

[6]  John J. Leonard,et al.  Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[7]  Manuel Mucientes,et al.  Omnidirectional visual SLAM under severe occlusions , 2015, Robotics Auton. Syst..

[8]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[9]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[10]  Sen Wang,et al.  End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[11]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[12]  Damien Vivet,et al.  A Review of Visual-LiDAR Fusion based Simultaneous Localization and Mapping , 2020, Sensors.

[13]  Dongbing Gu,et al.  Indoor Relocalization in Challenging Environments With Dual-Stream Convolutional Neural Networks , 2018, IEEE Transactions on Automation Science and Engineering.

[14]  P. L. Mazzeo,et al.  Improved video segmentation with color and depth using a stereo camera , 2013, 2013 IEEE International Conference on Industrial Technology (ICIT).