Estimating Metric Scale Visual Odometry from Videos using 3D Convolutional Networks

We present an end-to-end deep learning approach for performing metric scale-sensitive regression tasks such visual odometry with a single camera and no additional sensors. We propose a novel 3D convolutional architecture, 3DC-VO, that can leverage temporal relationships over a short moving window of images to estimate linear and angular velocities. The network makes local predictions on stacks of images that can be integrated to form a full trajectory. We apply 3DC-VO to the KITTI visual odometry benchmark and the task of estimating a pilot’s control inputs from a first-person video of a quadrotor flight. Our method exhibits increased accuracy relative to comparable learning-based algorithms trained on monocular images. We also show promising results for quadrotor control input prediction when trained on a new dataset collected with a UAV simulator.

[1]  Zhongliang Deng,et al.  MagicVO: An End-to-End Hybrid CNN and Bi-LSTM Method for Monocular Visual Odometry , 2019, IEEE Access.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[4]  Bärbel Mertsching,et al.  Fast Techniques for Monocular Visual Odometry , 2015, GCPR.

[5]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[6]  Uwe Stilla,et al.  METRIC SCALE CALCULATION FOR VISUAL MAPPING ALGORITHMS , 2018, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[7]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[8]  Swagat Kumar,et al.  UnDEMoN: Unsupervised Deep Network for Depth and Ego-Motion Estimation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[10]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Shiyu Song,et al.  Robust Scale Estimation in Real-Time Monocular SFM for Autonomous Driving , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Paolo Valigi,et al.  Evaluation of non-geometric methods for visual odometry , 2014, Robotics Auton. Syst..

[13]  Vijay Kumar,et al.  Minimum snap trajectory generation and control for quadrotors , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Bernard Ghanem,et al.  Teaching UAVs to Race With Observational Imitation Learning , 2018, ArXiv.

[17]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[18]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[19]  Taeyoung Lee,et al.  Geometric tracking control of a quadrotor UAV on SE(3) , 2010, 49th IEEE Conference on Decision and Control (CDC).

[20]  Jean-Bernard Hayet,et al.  Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Fabio Tozeto Ramos,et al.  Semi-parametric models for visual odometry , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  Marc Pollefeys,et al.  PIXHAWK: A system for autonomous flight using onboard computer vision , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23]  Andreas E. Savakis,et al.  Flowdometry: An Optical Flow and Deep Learning Based Approach to Visual Odometry , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Roland Siegwart,et al.  Onboard IMU and monocular vision based control for MAVs in unknown in- and outdoor environments , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Li Sun,et al.  Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Rudolf Mester,et al.  Predictive monocular odometry (PMO): What is possible without RANSAC and multiframe bundle adjustment? , 2017, Image Vis. Comput..

[27]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[28]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[29]  Olaf Kähler,et al.  Object-aware bundle adjustment for correcting monocular scale drift , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Bärbel Mertsching,et al.  On the Second Order Statistics of Essential Matrix Elements , 2014, GCPR.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Fabio Tozeto Ramos,et al.  Visual odometry learning for unmanned aerial vehicles , 2011, 2011 IEEE International Conference on Robotics and Automation.