Monocular Simultaneous Localisation and Mapping

Simultaneous localisation and mapping is the task of estimating from sensor observations both motion and structure in an unknown environment. Performing SLAMwith a single video camera, while an attractive prospect, adds its own particular difficulties to the already considerable general challenges of the problem. This thesis advances the state of the art in monocular SLAM in terms of efficiency, richness of scene description, statistical correctness, and robustness. First, a SLAM algorithm from the robotics literature, designed to permit efficient operation with complex maps, is adapted to the monocular setting. A method for efficiently and correctly adding landmarks to the map is presented. The implemented SLAM system accurately maps thousands of landmarks in real time, giving an orderof-magnitude performance improvement over previous methods. Next, the system is extended to allow incorporation of edge landmarks as well as points. Edgelet landmarks and their representation are defined, and a method is described for reliably tracking edgelets, even in the presence of measurement ambiguity. An efficient selection algorithm for acquiring new edgelets from video allows the system to quickly extend the map. The working system produces geometrically accurate and meaningful edge maps at frame rate. With a focus on preserving statistical consistency during estimation, a novel monocular SLAM algorithm is presented. Estimation proceeds on a graph of local maps, partitioning and coalescing the observations taken from video. Careful parameterisation keeps local maps consistent, while optimisation of the connecting graph structure aids global convergence. The system can handle thousands of landmarks at frame rate, while delivering statistical performance superior to existing methods. Finally, this thesis mitigates the problems of tracking failure and large-scale localisation with a unified framework for loop closing and recovery. A hierarchical method is presented for finding correspondences between new video images and the existing map, using local and global appearance models and structure estimates. The framework is instantiated within the graph-based monocular SLAM system. The extended implementation continues mapping despite repeated tracking failures, successfully joining maps and closing loops in real time.

[1]  Tom Drummond,et al.  Unified Loop Closing and Recovery for Real Time Monocular SLAM , 2008, BMVC.

[2]  J. A. Castellanos,et al.  Limits to the consistency of EKF-based SLAM , 2004 .

[3]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[4]  Howie Choset,et al.  Hierarchical simultaneous localization and mapping , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[6]  C. Tomasi,et al.  Factoring image sequences into shape and motion , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[7]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Ian D. Reid,et al.  Real-Time SLAM Relocalisation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Wolfram Burgard,et al.  Using the CONDENSATION algorithm for robust, vision-based mobile robot localization , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[11]  Nobuyuki Kita,et al.  3D simultaneous localisation and map-building using active vision for a robot moving on undulating terrain , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Javier Civera,et al.  Unified Inverse Depth Parametrization for Monocular SLAM , 2006, Robotics: Science and Systems.

[13]  James J. Little,et al.  /spl sigma/SLAM: stereo vision SLAM using the Rao-Blackwellised particle filter and a novel mixture proposal distribution , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[14]  Greg Welch,et al.  Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .

[15]  Andrew J. Davison,et al.  Active search for real-time vision , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  David Nistér,et al.  Preemptive RANSAC for live structure and motion estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[18]  Michel Dhome,et al.  Monocular Vision Based SLAM for Mobile Robots , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[19]  Ian D. Reid,et al.  Towards constant time SLAM using postponement , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[20]  John J. Leonard,et al.  Consistent, Convergent, and Constant-Time SLAM , 2003, IJCAI.

[21]  Eduardo Mario Nebot,et al.  Improving computational and memory requirements of simultaneous localization and map building algorithms , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[22]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[23]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[24]  Michel Devy,et al.  Undelayed initialization in bearing only SLAM , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Frank P. Ferrie,et al.  SELF-CALIBRATION AND METRIC RECONSTRUCTION FROM SINGLE IMAGES , 2008 .

[26]  José A. Castellanos,et al.  Mobile Robot Localization and Map Building , 1999 .

[27]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  David W. Murray,et al.  A unifying framework for structure and motion recovery from image sequences , 1995, Proceedings of IEEE International Conference on Computer Vision.

[29]  Matthew R. Walter,et al.  Sparse extended information filters: insights into sparsification , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Tom Drummond,et al.  Edge landmarks in monocular SLAM , 2009, Image Vis. Comput..

[31]  Michel Dhome,et al.  3D reconstruction of complex structures with bundle adjustment: an incremental approach , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[32]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[33]  Bill Triggs,et al.  Boundary Conditions for Young-van Vliet , 2006 .

[34]  Hugh F. Durrant-Whyte,et al.  Simultaneous Mapping and Localization with Sparse Extended Information Filters: Theory and Initial Results , 2004, WAFR.

[35]  Mark A. Paskin,et al.  Thin Junction Tree Filters for Simultaneous Localization and Mapping , 2002, IJCAI.

[36]  Tom Drummond,et al.  Going out: robust model-based tracking for outdoor augmented reality , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[37]  Hugh F. Durrant-Whyte,et al.  Uncertain geometry in robotics , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[38]  Jeffrey K. Uhlmann,et al.  Using covariance intersection for SLAM , 2007, Robotics Auton. Syst..

[39]  R. Chellappa,et al.  Recursive 3-D motion estimation from a monocular image sequence , 1990 .

[40]  Tom Drummond,et al.  Monocular SLAM as a Graph of Coalesced Observations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  David W. Murray,et al.  Mobile Robot Localisation using Active Visual Sensing , 1998 .

[44]  Raja Chatila,et al.  Stochastic multisensory data fusion for mobile robot location and environment modeling , 1989 .

[45]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[46]  Juan D. Tardós,et al.  Hierarchical SLAM: real-time accurate mapping of large environments , 2005, IEEE Transactions on Robotics.

[47]  John J. Leonard,et al.  Decoupled stochastic mapping [for mobile robot & AUV navigation] , 2001 .

[48]  Richard Szeliski,et al.  Recovering 3D shape and motion from image streams using nonlinear least squares , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Stefan B. Williams Efficient Solutions to Autonomous Mapping and Navigation Problems , 2009 .

[50]  Andrew Calway,et al.  Real-Time Camera Tracking Using a Particle Filter , 2005, BMVC.

[51]  李幼升,et al.  Ph , 1989 .

[52]  Eduardo Mario Nebot,et al.  Consistency of the EKF-SLAM Algorithm , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Ian D. Reid,et al.  An image-to-map loop closing method for monocular SLAM , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Hugh F. Durrant-Whyte,et al.  A computationally efficient solution to the simultaneous localisation and map building (SLAM) problem , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[55]  Paul Newman,et al.  Accelerated appearance-only SLAM , 2008, 2008 IEEE International Conference on Robotics and Automation.

[56]  Hanumant Singh,et al.  Exactly Sparse Delayed-State Filters for View-Based SLAM , 2006, IEEE Transactions on Robotics.

[57]  Walterio W. Mayol-Cuevas,et al.  Robust Real-Time Visual SLAM Using Scale Prediction and Exemplar Based Feature Description , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Chris Harris,et al.  Tracking with rigid models , 1993 .

[59]  Hanumant Singh,et al.  Visually Mapping the RMS Titanic: Conservative Covariance Estimates for SLAM Information Filters , 2006, Int. J. Robotics Res..

[60]  Peter Cheeseman,et al.  A stochastic map for uncertain spatial relationships , 1988 .

[61]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[62]  Sebastian Thrun,et al.  FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping that Provably Converges , 2003, IJCAI.

[63]  Sebastian Thrun,et al.  The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[64]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[65]  Lucas J. van Vliet,et al.  Recursive implementation of the Gaussian filter , 1995, Signal Process..

[66]  Roberto Cipolla,et al.  Real-Time Visual Tracking of Complex Structures , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  David G. Lowe,et al.  Local and global localization for mobile robots using visual landmarks , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[68]  Stefano Soatto,et al.  Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Christopher G. Harris,et al.  3D positional integration from image sequences , 1988, Image Vis. Comput..

[70]  Jeffrey K. Uhlmann,et al.  A non-divergent estimation algorithm in the presence of unknown correlations , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).

[71]  Eduardo Mario Nebot,et al.  Consistency of the FastSLAM algorithm , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[72]  Gamini Dissanayake,et al.  Bearing-only SLAM in Indoor Environments Using a Modified Particle Filter , 2003 .

[73]  Simon Lacroix,et al.  High resolution terrain mapping using low attitude aerial stereo imagery , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[74]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  José A. Castellanos,et al.  Robocentric map joining: Improving the consistency of EKF-SLAM , 2007, Robotics Auton. Syst..

[76]  Lindsay Kleeman,et al.  Feature-Based Mapping in Real, Large Scale Environments Using an Ultrasonic Array , 1999, Int. J. Robotics Res..

[77]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[78]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.

[79]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[80]  Paul Newman,et al.  Outdoor SLAM using visual appearance and laser ranging , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[81]  Jean Gallier,et al.  Geometric Methods and Applications: For Computer Science and Engineering , 2000 .

[82]  Andrew W. Fitzgibbon,et al.  Automatic Camera Recovery for Closed or Open Image Sequences , 1998, ECCV.

[83]  Michael Bosse,et al.  Simultaneous Localization and Map Building in Large-Scale Cyclic Environments Using the Atlas Framework , 2004, Int. J. Robotics Res..

[84]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[85]  Jeffrey K. Uhlmann,et al.  A counter example to the theory of simultaneous localization and map building , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[86]  Simon Lacroix,et al.  Monocular-vision based SLAM using Line Segments , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[87]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[88]  Richard Szeliski,et al.  Multi-image matching using multi-scale oriented patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[89]  Tom Drummond,et al.  A Single-frame Visual Gyroscope , 2005, BMVC.

[90]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[91]  Henrik I. Christensen,et al.  Vision SLAM in the Measurement Subspace , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[92]  David W. Murray,et al.  Simultaneous Localization and Map-Building Using Active Vision , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  Gregory Dudek,et al.  Robust place recognition using local appearance based methods , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[94]  Richard Szeliski,et al.  Recovering 3D Shape and Motion from Image Streams Using Nonlinear Least Squares , 1994, J. Vis. Commun. Image Represent..

[95]  Ian D. Reid,et al.  Mapping Large Loops with a Single Hand-Held Camera , 2007, Robotics: Science and Systems.

[96]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[97]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[98]  Andrew W. Fitzgibbon,et al.  Maintaining multiple motion model hypotheses over many views to recover matching and structure , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[99]  Alex Pentland,et al.  Recursive Estimation of Motion, Structure, and Focal Length , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[100]  Andrew J. Davison,et al.  Mobile Robot Navigation Using Active Vision , 1998 .

[101]  Patrick Hébert,et al.  Uncertain map making in natural environments , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[102]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[103]  Trevor Darrell,et al.  Reducing drift in parametric motion tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[104]  Pietro Perona,et al.  Visual navigation using a single camera , 1995, Proceedings of IEEE International Conference on Computer Vision.

[105]  Hugh F. Durrant-Whyte,et al.  Geometric projection filter: an efficient solution to the SLAM problem , 2001, SPIE Optics East.

[106]  Wolfram Burgard,et al.  Robotics: Science and Systems XV , 2010 .

[107]  James J. Little,et al.  Vision-based SLAM using the Rao-Blackwellised Particle Filter , 2005 .

[108]  O. Faugeras Stratification of three-dimensional vision: projective, affine, and metric representations , 1995 .

[109]  C. J. Taylor,et al.  Structure and motion in two dimensions from multiple images: a least squares approach , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[110]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization and Mapping with Sparse Extended Information Filters , 2004, Int. J. Robotics Res..

[111]  Paul Newman,et al.  Probabilistic Appearance Based Navigation and Loop Closing , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[112]  Tom Drummond,et al.  Edge landmarks in monocular SLAM , 2009, Image Vis. Comput..

[113]  Eduardo Mario Nebot,et al.  Optimization of the simultaneous localization and map-building algorithm for real-time implementation , 2001, IEEE Trans. Robotics Autom..

[114]  David J. Kriegman,et al.  Structure and Motion from Line Segments in Multiple Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[115]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[116]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[117]  Walterio W. Mayol-Cuevas,et al.  Real-Time Model-Based SLAM Using Line Segments , 2006, ISVC.

[118]  Ian D. Reid,et al.  Real-Time Monocular SLAM with Straight Lines , 2006, BMVC.

[119]  Sebastian Thrun,et al.  Simultaneous localization and mapping with unknown data association using FastSLAM , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[120]  Günther Schmidt,et al.  Building a global map of the environment of a mobile robot: the importance of correlations , 1997, Proceedings of International Conference on Robotics and Automation.

[121]  Stefano Soatto,et al.  A semi-direct approach to structure from motion , 2003, The Visual Computer.

[122]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[123]  Reinhard Koch,et al.  Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[124]  Dmitry B. Goldgof,et al.  An Objective Comparison Methodology of Edge Detection Algorithms Using a Structure from Motion Task , 1998, CVPR.

[125]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[126]  Sudeep Sarkar,et al.  Comparison of edge detectors: a methodology and initial study , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[127]  Kurt Konolige,et al.  Frame-Frame Matching for Realtime Consistent Visual Mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[128]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[129]  V. Varadarajan Lie groups, Lie algebras, and their representations , 1974 .

[130]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[131]  Tom Drummond,et al.  Scalable Monocular SLAM , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[132]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[133]  Tom Drummond,et al.  Tightly integrated sensor fusion for robust visual tracking , 2004, Image Vis. Comput..

[134]  Simon Lacroix,et al.  A practical 3D bearing-only SLAM algorithm , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[135]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[136]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.