Real-time monocular dense mapping on aerial robots using visual-inertial fusion

In this work, we present a solution to real-time monocular dense mapping. A tightly-coupled visual-inertial localization module is designed to provide metric and high-accuracy odometry. A motion stereo algorithm is proposed to take the video input from one camera to produce local depth measurements with semi-global regularization. The local measurements are then integrated into a global map for noise filtering and map refinement. The global map obtained is able to support navigation and obstacle avoidance for aerial robots through our indoor and outdoor experimental verification. Our system runs at 10Hz on an Nvidia Jetson TX1 by properly distributing computation to CPU and GPU. Through onboard experiments, we demonstrate its ability to close the perception-action loop for autonomous aerial robots. We release our implementation as open-source software1.

[1]  Roland Siegwart,et al.  Vision-Controlled Micro Flying Robots: From System Design to Autonomous Navigation and Mapping in GPS-Denied Environments , 2014, IEEE Robotics & Automation Magazine.

[2]  Frank Dellaert,et al.  Information fusion in navigation systems via factor graph based incremental smoothing , 2013, Robotics Auton. Syst..

[3]  Siddhartha S. Srinivasa,et al.  Chisel: Real Time Large Scale 3D Reconstruction Onboard a Mobile Device using Spatially Hashed Signed Distance Fields , 2015, Robotics: Science and Systems.

[4]  Roland Siegwart,et al.  Keyframe-Based Visual-Inertial SLAM using Nonlinear Optimization , 2013, Robotics: Science and Systems.

[5]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[6]  H. Hirschmüller Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Stereo Processing by Semi-global Matching and Mutual Information , 2022 .

[7]  F. Dellaert,et al.  Supplementary Material to: IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation , 2015 .

[8]  Fei Gao,et al.  Online quadrotor trajectory generation and autonomous navigation on point clouds , 2016, 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[9]  Roland Siegwart,et al.  A synchronized visual-inertial sensor system with FPGA pre-processing for accurate real-time SLAM , 2014, ICRA 2014.

[10]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[11]  Roland Siegwart,et al.  Real-time onboard visual-inertial state estimation and self-calibration of MAVs in unknown environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[12]  Stefano Soatto,et al.  Visual-inertial navigation, mapping and localization: A scalable real-time causal approach , 2011, Int. J. Robotics Res..

[13]  Gaurav S. Sukhatme,et al.  Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-calibration , 2011, Int. J. Robotics Res..

[14]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Stergios I. Roumeliotis,et al.  Stochastic cloning: a generalized framework for processing relative state measurements , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[16]  Pushmeet Kohli,et al.  MobileFusion: Real-Time Volumetric Surface Reconstruction and Dense Tracking on Mobile Phones , 2015, IEEE Transactions on Visualization and Computer Graphics.

[17]  Tightly-coupled Visual-Inertial Sensor Fusion based on IMU Pre-Integration , 2016 .

[18]  Nicholas Roy,et al.  Multi-level mapping: Real-time dense monocular SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Juan D. Tardós,et al.  Probabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM , 2015, Robotics: Science and Systems.

[20]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[21]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[22]  Shahram Izadi,et al.  MonoFusion: Real-time 3D reconstruction of small scenes with a single web camera , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[23]  John J. Leonard,et al.  High-performance and tunable stereo reconstruction , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[25]  Vijay Kumar,et al.  Tightly-coupled monocular visual-inertial fusion for autonomous flight of rotorcraft MAVs , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Torsten Sattler,et al.  3D Modeling on the Go: Interactive 3D Reconstruction of Large-Scale Scenes on Mobile Devices , 2015, 2015 International Conference on 3D Vision.

[27]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[28]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[29]  Shaojie Shen,et al.  Monocular Visual–Inertial State Estimation With Online Initialization and Camera–IMU Extrinsic Calibration , 2017, IEEE Transactions on Automation Science and Engineering.

[30]  Davide Scaramuzza,et al.  REMODE: Probabilistic, monocular dense reconstruction in real time , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).