LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation

This work proposes a novel deep network architecture to solve the camera ego-motion estimation problem. A motion estimation network generally learns features similar to optical flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an autoencoder network to find a nonlinear representation of the OF manifold. In addition, we propose to learn the latent space jointly with the estimation task, so that the learned OF features become a more robust description of the OF input. We call this novel architecture latent space visual odometry (LS-VO). The experiments show that LS-VO achieves a considerable increase in performances with respect to baselines, while the number of parameters of the estimation network only slightly increases.

[1]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[3]  F. Dellaert,et al.  Learning general optical flow subspaces for egomotion estimation and detection of motion anomalies , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[5]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[6]  Cristóbal Curio,et al.  Experts of probabilistic flow subspaces for robust monocular odometry in urban areas , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[7]  Fabio Tozeto Ramos,et al.  Visual odometry learning for unmanned aerial vehicles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[10]  Javier González,et al.  Learning-Based Image Enhancement for Visual Odometry in Challenging HDR Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  James J. Kuffner,et al.  Effective sampling and distance metrics for 3D rigid body path planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[14]  Richard Roberts,et al.  Optical flow templates for mobile robot environment understanding , 2014 .

[15]  Tucker R. Balch,et al.  Memory-based learning for visual odometry , 2008, 2008 IEEE International Conference on Robotics and Automation.

[16]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[17]  Andreas E. Savakis,et al.  Flowdometry: An Optical Flow and Deep Learning Based Approach to Visual Odometry , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Rudolf Mester,et al.  Learning rank reduced interpolation with principal component analysis , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[19]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Francisco Angel Moreno,et al.  The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[21]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Sen Wang,et al.  VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem , 2017, AAAI.

[23]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[24]  Michael Gassner,et al.  SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems , 2017, IEEE Transactions on Robotics.

[25]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[26]  Fabio Tozeto Ramos,et al.  Semi-parametric models for visual odometry , 2012, 2012 IEEE International Conference on Robotics and Automation.

[27]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[28]  Jörg Stückler,et al.  Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[31]  John J. Leonard,et al.  Towards visual ego-motion learning in robots , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Michael J. Black,et al.  Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Allan D. Jepson,et al.  Subspace methods for recovering rigid motion I: Algorithm and implementation , 2004, International Journal of Computer Vision.

[34]  Paolo Valigi,et al.  Evaluation of non-geometric methods for visual odometry , 2014, Robotics Auton. Syst..