VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem

In this paper we present an on-manifold sequence-to-sequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the first end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches. Specifically, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain specific information which significantly mitigates drift. We show that our approach is competitive with state-of-the-art traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[3]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[4]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[5]  Gabe Sibley,et al.  A Sliding Window Filter for Incremental SLAM , 2008 .

[6]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[7]  David W. Murray,et al.  Combining monoSLAM with object recognition for scene augmentation using a wearable camera , 2010, Image Vis. Comput..

[8]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[9]  Roland Siegwart,et al.  Real-time onboard visual-inertial state estimation and self-calibration of MAVs in unknown environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[10]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[13]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Frank Dellaert,et al.  IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation , 2015, Robotics: Science and Systems.

[15]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[16]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[17]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[18]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[19]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Dongbing Gu,et al.  A review of visual inertial odometry from filtering and optimisation perspectives , 2015, Adv. Robotics.

[21]  Alain Pagani,et al.  Learning to Fuse: A Deep Learning Approach to Visual-Inertial Camera Pose Estimation , 2016, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[22]  Roland Siegwart,et al.  The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[23]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[24]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.