Flow-based SLAM: From geometry computation to learning

Abstract Simultaneous localization and mapping (SLAM) has attracted considerable research interest from the robotics and computer-vision communities for >30 years. With steady and progressive efforts being made, modern SLAM systems allow robust and online applications in real-world scenes. We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained. Herein, we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature, ranging from geometric computation to the emerging learning techniques. We start by reviewing two essential stages for geometric computation, presenting the de facto standard pipeline and problem formulation, along with the utilization of temporal cues. The recently emerging techniques are then summarized, covering a wide range of areas, such as learning techniques, sensor fusion, and continuoustime trajectory modeling. This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature, as well as the topics worthy of further investigation for better utilizing the temporal cues.

[1]  Slobodan Ilic,et al.  Survey of Higher Order Rigid Body Motion Interpolation Methods for Keyframe Animation and Continuous-Time Trajectory Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[2]  Yi-Hsuan Tsai,et al.  Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hauke Strasdat,et al.  Real-time monocular SLAM: Why filter? , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Simo Särkkä,et al.  Batch Continuous-Time Trajectory Estimation as Exactly Sparse Gaussian Process Regression , 2014, Robotics: Science and Systems.

[5]  Jörg Stückler,et al.  Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[7]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[9]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[10]  Michael Milford,et al.  Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[12]  Roland Siegwart,et al.  Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback , 2017, Int. J. Robotics Res..

[13]  Jörg Stückler,et al.  Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Wolfram Burgard,et al.  A Tutorial on Graph-Based SLAM , 2010, IEEE Intelligent Transportation Systems Magazine.

[15]  Gabe Sibley,et al.  Spline Fusion: A continuous-time representation for visual-inertial fusion with application to rolling shutter cameras , 2013, BMVC.

[16]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[17]  Danping Zou,et al.  StructSLAM: Visual SLAM With Building Structure Lines , 2015, IEEE Transactions on Vehicular Technology.

[18]  Frank Dellaert,et al.  Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing , 2006, Int. J. Robotics Res..

[19]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Viorela Ila,et al.  SLAM++ 1 -A highly efficient and temporally scalable incremental SLAM framework , 2017, Int. J. Robotics Res..

[21]  Li Sun,et al.  Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Gabe Sibley,et al.  Constant Time Sliding Window Filter SLAM as a Basis for Metric Visual Perception∗ , 2007 .

[23]  Sebastien Glaser,et al.  Simultaneous Localization and Mapping: A Survey of Current Trends in Autonomous Driving , 2017, IEEE Transactions on Intelligent Vehicles.

[24]  Qijun Chen,et al.  Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Daniel Cremers,et al.  Event-based 3D SLAM with a depth-augmented dynamic vision sensor , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[27]  Sen Wang,et al.  End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[28]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[29]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[30]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  U SaputraMuhamad Risqi,et al.  Visual SLAM and Structure from Motion in Dynamic Environments , 2018 .

[32]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[34]  Thomas Brox,et al.  DeepTAM: Deep Tracking and Mapping , 2018, ECCV.

[35]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[36]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[37]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[39]  Anastasios I. Mourikis,et al.  Motion tracking with fixed-lag smoothing: Algorithm and consistency analysis , 2011, 2011 IEEE International Conference on Robotics and Automation.

[40]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[41]  Frank Dellaert,et al.  A hierarchical wavelet decomposition for continuous-time SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Tim D. Barfoot,et al.  Towards relative continuous-time SLAM , 2013, 2013 IEEE International Conference on Robotics and Automation.

[45]  Swagat Kumar,et al.  UnDEMoN: Unsupervised Deep Network for Depth and Ego-Motion Estimation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[49]  Davide Scaramuzza,et al.  Low-latency visual odometry using event-based feature tracks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50]  Reinhard Koch,et al.  An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency , 2013, J. Vis. Commun. Image Represent..

[51]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Tomasz Malisiewicz,et al.  Toward Geometric Deep SLAM , 2017, ArXiv.

[53]  Daniel Cremers,et al.  Direct Sparse Visual-Inertial Odometry Using Dynamic Marginalization , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[54]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[55]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization and Mapping with Sparse Extended Information Filters , 2004, Int. J. Robotics Res..

[56]  Torsten Sattler,et al.  VSO: Visual Semantic Odometry , 2018, ECCV.

[57]  Hujun Bao,et al.  Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality , 2019, Virtual Real. Intell. Hardw..

[58]  Gamini Dissanayake,et al.  A critique of current developments in simultaneous localization and mapping , 2016 .

[59]  Juan D. Tardós,et al.  Visual-Inertial Monocular SLAM With Map Reuse , 2016, IEEE Robotics and Automation Letters.

[60]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[61]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[62]  Ivan Markovic,et al.  Exactly sparse delayed state filter on Lie groups for long-term pose graph SLAM , 2018, Int. J. Robotics Res..

[63]  Simo Särkkä,et al.  Batch nonlinear continuous-time trajectory estimation as exactly sparse Gaussian process regression , 2014, Autonomous Robots.

[64]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65]  Gamini Dissanayake,et al.  A review of recent developments in Simultaneous Localization and Mapping , 2011, 2011 6th International Conference on Industrial and Information Systems.

[66]  Stergios I. Roumeliotis,et al.  An observability-constrained sliding window filter for SLAM , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[68]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[69]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Gonzalo Ferrer,et al.  ApriISAM: Real-Time Smoothing and Mapping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[74]  Paul Timothy Furgale,et al.  Continuous-time batch trajectory estimation using temporal basis functions , 2015, Int. J. Robotics Res..

[75]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[76]  Daniel Cremers,et al.  Robust odometry estimation for RGB-D cameras , 2013, 2013 IEEE International Conference on Robotics and Automation.

[77]  John Folkesson,et al.  Sparse2Dense: From Direct Sparse Odometry to Dense 3-D Reconstruction , 2019, IEEE Robotics and Automation Letters.

[78]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[79]  Antonios Gasteratos,et al.  Assigning Visual Words to Places for Loop Closure Detection , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[80]  Rafael Grompone von Gioi,et al.  LSD: A Fast Line Segment Detector with a False Detection Control , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Hongbin Zha,et al.  Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Stefan Roth,et al.  Learning rotation-aware features: From invariant priors to equivariant descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  John Folkesson,et al.  GCNv2: Efficient Correspondence Prediction for Real-Time SLAM , 2019, IEEE Robotics and Automation Letters.

[84]  Jörg Conradt,et al.  Simultaneous Localization and Mapping for Event-Based Vision Systems , 2013, ICVS.

[85]  Kristen Grauman,et al.  Learning Image Representations Tied to Ego-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[86]  David W. Murray,et al.  A Square Root Unscented Kalman Filter for visual monoSLAM , 2008, 2008 IEEE International Conference on Robotics and Automation.

[87]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[88]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Francesc Moreno-Noguer,et al.  PL-SLAM: Real-time monocular visual SLAM with points and lines , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[90]  A. Davison,et al.  Towards Visual SLAM with Event-based Cameras , 2015 .

[91]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[93]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[94]  Stefan Leutenegger,et al.  ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..

[95]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[96]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[97]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[98]  Yasin Almalioglu,et al.  GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[99]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[100]  Ben Glocker,et al.  Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[101]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[102]  Jingjing Du,et al.  A comparative study on active SLAM and autonomous exploration with Particle Filters , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[103]  Aram Kawewong,et al.  Online and Incremental Appearance-based SLAM in Highly Dynamic Environments , 2011, Int. J. Robotics Res..

[104]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Vladlen Koltun,et al.  Fast Global Registration , 2016, ECCV.

[106]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[107]  Vladlen Koltun,et al.  Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[108]  Davide Scaramuzza,et al.  Continuous-Time Visual-Inertial Odometry for Event Cameras , 2017, IEEE Transactions on Robotics.

[109]  Gaurav S. Sukhatme,et al.  Sliding window filter with application to planetary landing , 2010, J. Field Robotics.

[110]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Stanley T. Birchfield,et al.  Replacing Projective Data Association with Lucas-Kanade for KinectFusion , 2013, 2013 IEEE International Conference on Robotics and Automation.

[112]  Gordon Wyeth,et al.  CAT-SLAM: probabilistic localisation and mapping using a continuous appearance-based trajectory , 2012, Int. J. Robotics Res..

[113]  Marc Pollefeys,et al.  Robust Dense Mapping for Large-Scale Dynamic Environments , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[114]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[115]  Jörg Stückler,et al.  Direct visual-inertial odometry with stereo cameras , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[116]  Stergios I. Roumeliotis,et al.  A First-Estimates Jacobian EKF for Improving SLAM Consistency , 2009, ISER.

[117]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[118]  Per-Erik Forssén,et al.  Spline Error Weighting for Robust Visual-Inertial Fusion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[119]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[120]  Dirk Wollherr,et al.  IBuILD: Incremental bag of Binary words for appearance based loop closure detection , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[121]  Eduardo Mario Nebot,et al.  Consistency of the EKF-SLAM Algorithm , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[122]  Tim D. Barfoot,et al.  Full STEAM ahead: Exactly sparse gaussian process regression for batch continuous-time trajectory estimation on SE(3) , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[123]  Gamini Dissanayake,et al.  Sparse Local Submap Joining Filter for Building Large-Scale Maps , 2008, IEEE Transactions on Robotics.

[124]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[125]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[126]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[127]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[128]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[129]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[130]  Paul Timothy Furgale,et al.  Gaussian Process Gauss–Newton for non-parametric simultaneous localization and mapping , 2013, Int. J. Robotics Res..

[131]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[132]  Hongbin Zha,et al.  Guided Feature Selection for Deep Visual Odometry , 2018, ACCV.

[133]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[134]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[135]  Davide Scaramuzza,et al.  EVO: A Geometric Approach to Event-Based 6-DOF Parallel Tracking and Mapping in Real Time , 2017, IEEE Robotics and Automation Letters.

[136]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[137]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.