BAD SLAM: Bundle Adjusted Direct RGB-D SLAM

A key component of Simultaneous Localization and Mapping (SLAM) systems is the joint optimization of the estimated 3D map and camera trajectory. Bundle adjustment (BA) is the gold standard for this. Due to the large number of variables in dense RGB-D SLAM, previous work has focused on approximating BA. In contrast, in this paper we present a novel, fast direct BA formulation which we implement in a real-time dense RGB-D SLAM algorithm. In addition, we show that direct RGB-D SLAM systems are highly sensitive to rolling shutter, RGB and depth sensor synchronization, and calibration errors. In order to facilitate state-of-the-art research on direct RGB-D SLAM, we propose a novel, well-calibrated benchmark for this task that uses synchronized global shutter RGB and depth cameras. It includes a training set, a test set without public ground truth, and an online evaluation service. We observe that the ranking of methods changes on this dataset compared to existing ones, and our proposed algorithm outperforms all other evaluated SLAM methods. Our benchmark and our open source SLAM algorithm are available at: www.eth3d.net

[1]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[2]  Matthias Nießner,et al.  State of the Art on 3D Reconstruction with RGB‐D Cameras , 2018, Comput. Graph. Forum.

[3]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[6]  Daniel Herrera C,et al.  Joint depth and color camera calibration with distortion correction. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[7]  Patrick Rives,et al.  Real-time Dense Visual Tracking under Large Lighting Variations , 2011, BMVC.

[8]  Andrew I. Comport,et al.  A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Jan-Michael Frahm,et al.  Variable baseline/resolution stereo , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[11]  Daniel Cremers,et al.  LDSO: Direct Sparse Odometry with Loop Closure , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Andrew I. Comport,et al.  Super-resolution 3D tracking and mapping , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Brett Browning,et al.  Photometric Bundle Adjustment for Vision-Based SLAM , 2016, ACCV.

[14]  Andrew I. Comport,et al.  On unifying key-frame and voxel-based dense visual SLAM at large scales , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[16]  Andrew W. Fitzgibbon,et al.  Large-scale and drift-free surface reconstruction using online subvolume registration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Jean Ponce,et al.  Accurate Camera Calibration from Multi-View Stereo and Bundle Adjustment , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[20]  Marc Pollefeys,et al.  Photometric Bundle Adjustment for Dense Multi-view 3D Modeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Matthias Nießner,et al.  PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction , 2018, ECCV.

[22]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[23]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[24]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[25]  Daniel Cremers,et al.  A Photometrically Calibrated Benchmark For Monocular Visual Odometry , 2016, ArXiv.

[26]  Jörg Stückler,et al.  Direct Sparse Odometry with Rolling Shutter , 2018, ECCV.

[27]  Luc Van Gool,et al.  Online loop closure for real-time interactive 3D scanning , 2011, Comput. Vis. Image Underst..

[28]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tim Weyrich,et al.  Comprehensive Use of Curvature for Robust and Accurate Online Surface Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Marc Pollefeys,et al.  Sparse to Dense 3D Reconstruction from Rolling Shutter Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Olaf Kähler,et al.  Hierarchical Voxel Block Hashing for Efficient Integration of Depth Images , 2016, IEEE Robotics and Automation Letters.

[32]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Johannes L. Schönberger,et al.  Supplementary Material for A MultiView Stereo Benchmark with High-Resolution Images and Multi-Camera Videos , 2017 .

[34]  H. K. Nishihara,et al.  Practical Real-Time Imaging Stereo Matcher , 1984 .

[35]  Jörg Stückler,et al.  Multi-resolution surfel maps for efficient dense 3D modeling and tracking , 2014, J. Vis. Commun. Image Represent..

[36]  Yinda Zhang,et al.  ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems , 2018, ECCV.

[37]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[38]  Tim Weyrich,et al.  Anisotropic point-based fusion , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[39]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[40]  Marc Pollefeys,et al.  Illumination change robustness in direct visual SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Andrea Fusiello,et al.  Accurate and Automatic Alignment of Range Surfaces , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[42]  Jun Wang,et al.  Online Reconstruction of Indoor Scenes from RGB-D Streams , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[44]  Nassir Navab,et al.  SDF-2-SDF: Highly Accurate 3D Object Reconstruction , 2016, ECCV.

[45]  Daniel Cremers,et al.  A Super-Resolution Framework for High-Accuracy Multiview Reconstruction , 2013, International Journal of Computer Vision.

[46]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[47]  Mao Ye,et al.  Dense Visual SLAM with Probabilistic Surfel Map , 2017, IEEE Transactions on Visualization and Computer Graphics.

[48]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[49]  Roland Siegwart,et al.  Omnidirectional visual obstacle detection using embedded FPGA , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50]  Jörg Stückler,et al.  The TUM VI Benchmark for Evaluating Visual-Inertial Odometry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  David Nister,et al.  Bundle Adjustment Rules , 2006 .

[52]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[53]  Andrew I. Comport,et al.  3D High Dynamic Range dense visual SLAM and its application to real-time object re-lighting , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[54]  Marc Pollefeys,et al.  Structureless pose-graph loop-closure with a multi-camera system on a self-driving car , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55]  Tomás Pajdla,et al.  Degeneracies in Rolling Shutter SfM , 2016, ECCV.

[56]  Didier Stricker,et al.  CoRBS: Comprehensive RGB-D benchmark for SLAM using Kinect v2 , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[57]  Olaf Kähler,et al.  Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure , 2016, ECCV.

[58]  John J. Leonard,et al.  Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[59]  Per-Erik Forssén,et al.  Scan rectification for structured light range sensors with rolling shutters , 2011, 2011 International Conference on Computer Vision.

[60]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Roland Siegwart,et al.  The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[62]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[63]  Daniel Cremers,et al.  Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[65]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[66]  Michael Gassner,et al.  SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems , 2017, IEEE Transactions on Robotics.

[67]  Vladlen Koltun,et al.  Simultaneous Localization and Calibration: Self-Calibration of Consumer Depth Cameras , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[69]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[71]  Vladlen Koltun,et al.  Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..