Ego-Motion Estimation Using Recurrent Convolutional Neural Networks through Optical Flow Learning

Visual odometry (VO) refers to incremental estimation of the motion state of an agent (e.g., vehicle and robot) by using image information, and is a key component of modern localization and navigation systems. Addressing the monocular VO problem, this paper presents a novel end-to-end network for estimation of camera ego-motion. The network learns the latent subspace of optical flow (OF) and models sequential dynamics so that the motion estimation is constrained by the relations between sequential images. We compute the OF field of consecutive images and extract the latent OF representation in a self-encoding manner. A Recurrent Neural Network is then followed to examine the OF changes, i.e., to conduct sequential learning. The extracted sequential OF subspace is used to compute the regression of the 6-dimensional pose vector. We derive three models with different network structures and different training schemes: LS-CNN-VO, LS-AE-VO, and LS-RCNN-VO. Particularly, we separately train the encoder in an unsupervised manner. By this means, we avoid non-convergence during the training of the whole network and allow more generalized and effective feature representation. Substantial experiments have been conducted on KITTI and Malaga datasets, and the results demonstrate that our LS-RCNN-VO outperforms the existing learning-based VO approaches.

[1]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[2]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[3]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Peyman Kabiri,et al.  A Novel Feature-Based Approach for Indoor Monocular SLAM , 2018, Electronics.

[5]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Hao Fu,et al.  Large-Scale Outdoor SLAM Based on 2D Lidar , 2019, Electronics.

[7]  Gabriele Costante,et al.  LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation , 2017, IEEE Robotics and Automation Letters.

[8]  Sen Wang,et al.  VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem , 2017, AAAI.

[9]  Myeongseob Ko,et al.  Illumination-Insensitive Skin Depth Estimation from a Light-Field Camera Based on CGANs toward Haptic Palpation , 2018, Electronics.

[10]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Matthew Chin Heng Chua,et al.  Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture , 2019, Computational and structural biotechnology journal.

[12]  Paolo Valigi,et al.  Evaluation of non-geometric methods for visual odometry , 2014, Robotics Auton. Syst..

[13]  Nguyen Quoc Khanh Le,et al.  ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins , 2019, BMC Bioinformatics.

[14]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[17]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Ting Rui,et al.  LAP-SLAM: A Line-Assisted Point-Based Monocular VSLAM , 2019 .

[19]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yingping Huang,et al.  Stereo Visual Odometry Based on Motion Decoupling and Special Feature Screening for Navigation of Autonomous Vehicles , 2019, IEEE Sensors Journal.

[21]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yasin Almalioglu,et al.  GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Sen Wang,et al.  End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[24]  Daniel Cremers,et al.  Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Tucker R. Balch,et al.  Memory-based learning for visual odometry , 2008, 2008 IEEE International Conference on Robotics and Automation.

[26]  Dah-Jye Lee,et al.  Optimization and Implementation of Synthetic Basis Feature Descriptor on FPGA , 2020 .

[27]  Yalin Yang,et al.  A Novel FastSLAM Framework Based on 2D Lidar for Autonomous Mobile Robot , 2020, Electronics.

[28]  Hongbin Zha,et al.  Guided Feature Selection for Deep Visual Odometry , 2018, ACCV.

[29]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Frank L. Lewis,et al.  Non-Uniform Discretization-based Ordinal Regression for Monocular Depth Estimation of an Indoor Drone , 2020 .

[32]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Antoni Grau,et al.  Cooperative Visual-SLAM System for UAV-Based Target Tracking in GPS-Denied Environments: A Target-Centric Approach , 2020 .

[36]  Tao Yang,et al.  Monocular Vision SLAM-Based UAV Autonomous Landing in Emergencies and Unknown Environments , 2018 .

[37]  Hongbin Zha,et al.  Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Francisco Angel Moreno,et al.  The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[39]  Fabio Ramos,et al.  Semi-parametric learning for visual odometry , 2013, Int. J. Robotics Res..