BooM-Vio: Bootstrapped Monocular Visual-Inertial Odometry with Absolute Trajectory Estimation through Unsupervised Deep Learning

Machine learning has emerged as an extraordinary tool for solving many computer vision tasks by extracting and correlating meaningful features from high dimensional inputs in ways that often exceed the best human-derived modeling efforts. However, the area of vision-aided localization remains diverse with many traditional, model-based approaches (i.e. filtering- or nonlinear least- squares- based) often outperforming deep, model-free approaches. In this work, we present Bootstrapped Monocular VIO (BooM), a scaled monocular visual-inertial odometry (VIO) solution that leverages the complex data association ability of model-free approaches with the ability to exploit known geometric dynamics with model-based approaches. Our end-to-end, unsupervised deep neural network simultaneously learns to perform visual-inertial odometry and estimate scene depth while scale is enforced through a loss signal computed from position change magnitude estimates from traditional methods. We evaluate our network against a state-of-the-art (SoA) approach on the KITTI driving dataset as well as a micro aerial vehicle (MAV) dataset that we collected in the AirSim simulation environment. We further demonstrate the benefits of our combined approach through robustness tests on degraded trajectories.

[1]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Valentin Peretroukhin,et al.  DPC-Net: Deep Pose Correction for Visual Localization , 2017, IEEE Robotics and Automation Letters.

[3]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[5]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[6]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[9]  Davide Scaramuzza,et al.  A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[11]  Kyle Lindgren,et al.  Unsupervised Deep Visual-Inertial Odometry with Online Error Correction for RGB-D Imagery , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Jared Shamwell,et al.  Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network with Online Error Correction , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[15]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[17]  Roland Siegwart,et al.  Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback , 2017, Int. J. Robotics Res..

[18]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[19]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[21]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[22]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[23]  Sen Wang,et al.  VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem , 2017, AAAI.