SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems

Direct methods for visual odometry (VO) have gained popularity for their capability to exploit information from all intensity gradients in the image. However, low computational speed as well as missing guarantees for optimality and consistency are limiting factors of direct methods, in which established feature-based methods succeed instead. Based on these considerations, we propose a semidirect VO (SVO) that uses direct methods to track and triangulate pixels that are characterized by high image gradients, but relies on proven feature-based methods for joint optimization of structure and motion. Together with a robust probabilistic depth estimation algorithm, this enables us to efficiently track pixels lying on weak corners and edges in environments with little or high-frequency texture. We further demonstrate that the algorithm can easily be extended to multiple cameras, to track edges, to include motion priors, and to enable the use of very large field of view cameras, such as fisheye and catadioptric ones. Experimental evaluation on benchmark datasets shows that the algorithm is significantly faster than the state of the art while achieving highly competitive accuracy.

[1]  Davide Scaramuzza,et al.  Benefit of large field-of-view cameras for visual odometry , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Dimitrios G. Kottas,et al.  On the Consistency of Vision-Aided Inertial Navigation , 2012, ISER.

[4]  Stefano Soatto,et al.  A semi-direct approach to structure from motion , 2003, The Visual Computer.

[5]  Éric Marchand,et al.  A real-time tracker for markerless augmented reality , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[6]  Carlos Hernández,et al.  Video-based, real-time multi-view stereo , 2011, Image Vis. Comput..

[7]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Pushmeet Kohli,et al.  MobileFusion: Real-Time Volumetric Surface Reconstruction and Dense Tracking on Mobile Phones , 2015, IEEE Transactions on Visualization and Computer Graphics.

[9]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[12]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Emanuele Menegatti,et al.  Omnidirectional dense large-scale mapping and navigation based on meaningful triangulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Roberto Cipolla,et al.  Real-Time Visual Tracking of Complex Structures , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Stefano Soatto,et al.  Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[17]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.

[18]  Chris Harris,et al.  RAPID - a video rate object tracker , 1990, BMVC.

[19]  Michal Irani,et al.  All About Direct Methods , 1999 .

[20]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[21]  Vincent Lepetit,et al.  Combining edge and texture information for real-time accurate 3D camera tracking , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[22]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[23]  Flavio Fontana,et al.  Autonomous, Vision‐based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle , 2016, J. Field Robotics.

[24]  Roland Siegwart,et al.  A Flexible Technique for Accurate Omnidirectional Camera Calibration and Structure from Motion , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[25]  Stefano Soatto,et al.  Real-Time Feature Tracking and Outlier Rejection with Changes in Illumination , 2001, ICCV.

[26]  G. Chirikjian Stochastic Models, Information Theory, and Lie Groups, Volume 2 , 2012 .

[27]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Roland Siegwart,et al.  A synchronized visual-inertial sensor system with FPGA pre-processing for accurate real-time SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[29]  P. Anandan,et al.  About Direct Methods , 1999, Workshop on Vision Algorithms.

[30]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[33]  John J. Leonard,et al.  Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[34]  Javier Ibanez Guzman,et al.  Accurate visual odometry from a rear parking camera , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[35]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Davide Scaramuzza,et al.  REMODE: Probabilistic, monocular dense reconstruction in real time , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Patrick Rives,et al.  An Efficient Direct Approach to Visual SLAM , 2008, IEEE Transactions on Robotics.

[38]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[40]  Tommi Tykkala,et al.  Direct Iterative Closest Point for real-time visual odometry , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[41]  Daniel Cremers,et al.  Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Andrew I. Comport,et al.  On unifying key-frame and voxel-based dense visual SLAM at large scales , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Daniel Cremers,et al.  Robust odometry estimation for RGB-D cameras , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44]  S. Shankar Sastry,et al.  An Invitation to 3-D Vision: From Images to Geometric Models , 2003 .

[45]  Roland Siegwart,et al.  Unified temporal and spatial calibration for multi-sensor systems , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Stergios I. Roumeliotis,et al.  Observability-based Rules for Designing Consistent EKF SLAM Estimators , 2010, Int. J. Robotics Res..

[47]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[48]  Javier Civera,et al.  Inverse Depth Parametrization for Monocular SLAM , 2008, IEEE Transactions on Robotics.

[49]  Patrick Rives,et al.  Real-time Quadrifocal Visual Odometry , 2010, Int. J. Robotics Res..

[50]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[51]  Tim D. Barfoot,et al.  At all Costs: A Comparison of Robust Cost Functions for Camera Correspondence Outliers , 2015, 2015 12th Conference on Computer and Robot Vision.

[52]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[53]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[54]  Patrick Rives,et al.  Efficient Homography-Based Tracking and 3-D Reconstruction for Single-Viewpoint Sensors , 2008, IEEE Transactions on Robotics.

[55]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[56]  Thiagalingam Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation , 2001 .

[57]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[58]  Larry H. Matthies,et al.  Two years of Visual Odometry on the Mars Exploration Rovers , 2007, J. Field Robotics.

[59]  Larry Matthies,et al.  Two years of Visual Odometry on the Mars Exploration Rovers: Field Reports , 2007 .

[60]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[61]  Patrick Rives,et al.  Real-time Dense Visual Tracking under Large Lighting Variations , 2011, BMVC.

[62]  Flavio Fontana,et al.  Continuous on-board monocular-vision-based elevation mapping applied to autonomous landing of micro aerial vehicles , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[63]  Frank Dellaert,et al.  On-Manifold Preintegration Theory for Fast and Accurate Visual-Inertial Navigation , 2015, ArXiv.

[64]  Tom Drummond,et al.  Going out: robust model-based tracking for outdoor augmented reality , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[65]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[66]  S. Ullman,et al.  The interpretation of visual motion , 1977 .

[67]  Frank Dellaert,et al.  Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing , 2006, Int. J. Robotics Res..

[68]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[69]  Frank Dellaert,et al.  Fast Image-Based Tracking by Selective Pixel Integration , 2011 .

[70]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[71]  Selim Benhimane,et al.  Integration of Euclidean constraints in template based visual tracking of piecewise-planar scenes , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.