Real-time scalable structure from motion: from fundamental geometric vision to collaborative mapping

Good egomotion estimation forms the backbone of any modern highperformance localization and navigation system. In contrast to global positioning systems or laser-based range measurement devices, the use of cameras represents an increasingly interesting alternative promising the applicability in a vast number of related scenarios. The range of possible fields of application easily extends from indoor to outdoor, small-scale to large-scale, and under-water to aerial operations. The computer vision community has investigated camera-based egomotion estimation for more than three decades now—research that has lead to impressive results. Two fundamental ways have been pursued. The first one consists of a purely geometric approach, where the incremental transformation between consecutive images is each time computed by absolute or relative camera pose algorithms, based on the identified feature correspondences. The second approach additionally takes time information into account, and estimates priors about the relative camera displacement by means of a motion model. Each modality can be extended by taking additional sensorial information into account, such as inertial readings from an IMU, GPS data, or laser range information. The present dissertation represents a contribution towards purely geometric visual egomotion-estimation approaches, which are of major importance for initializing model-based solutions or robustifying estimation of motion with challenging dynamics. The scope ranges from fundamental algebraic geometry to the practical realization of systems for real-time motion estimation on real-life image sequences. The main cornerstones of this dissertation are given by a number of novel geometrical solutions for absolute and relative camera-pose computation in the calibrated case. The presented minimal solution for absolute camera-pose computation sets a new standard in terms of efficiency and numerical robustness. This algorithm is then extended to the non-perspective case via minimal and linear complexity n-point solutions. Finally, the derivation of an intuitive novel epipolar constraint leads to a minimal solution for direct translation-independent computation of the

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Hongdong Li,et al.  A linear approach to motion estimation using generalized camera models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  N. Trawny,et al.  Indirect Kalman Filter for 3 D Attitude Estimation , 2005 .

[4]  Roland Siegwart,et al.  On the initialization of statistical optimum filters with application to motion estimation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  W. Gander Least squares fit of point clouds , 1998 .

[6]  Karl Johan Åström,et al.  Solutions to Minimal Generalized Relative Pose Problems , 2005 .

[7]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[8]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[9]  Robert Pless,et al.  Using many cameras as one , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  James R. Bergen,et al.  Visual odometry for ground vehicle applications , 2006, J. Field Robotics.

[11]  Larry S. Davis,et al.  Exact and Approximate Solutions of the Perspective-Three-Point Problem , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  S. Shankar Sastry,et al.  An Invitation to 3-D Vision: From Images to Geometric Models , 2003 .

[14]  M. Pollefeys Self-calibration and metric 3d reconstruction from uncalibrated image sequences , 1999 .

[15]  Roland Siegwart,et al.  A monocular vision-based system for 6D relative robot localization , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Hauke Strasdat,et al.  Scale Drift-Aware Large Scale Monocular SLAM , 2010, Robotics: Science and Systems.

[17]  Roland Siegwart,et al.  Finding the Exact Rotation between Two Images Independently of the Translation , 2012, ECCV.

[18]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19]  Marc Pollefeys,et al.  A Minimal Case Solution to the Calibrated Relative Pose Problem for the Case of Two Known Orientation Angles , 2010, ECCV.

[20]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[21]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Darius Burschka,et al.  Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[23]  Peter Corke,et al.  An Introduction to Inertial and Visual Sensing , 2007, Int. J. Robotics Res..

[24]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[25]  J. Faugère A new efficient algorithm for computing Gröbner bases (F4) , 1999 .

[26]  Luke Fletcher,et al.  Multiple relative pose graphs for robust cooperative mapping , 2010, 2010 IEEE International Conference on Robotics and Automation.

[27]  Richard I. Hartley,et al.  Optimal Algorithms in Multiview Geometry , 2007, ACCV.

[28]  Manolis I. A. Lourakis,et al.  SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[29]  Wolfram Burgard,et al.  Collaborative multi-robot exploration , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[30]  Teresa A. Vidal-Calleja,et al.  Large scale multiple robot visual mapping with heterogeneous landmarks in semi-structured terrain , 2011, Robotics Auton. Syst..

[31]  Jorge Dias,et al.  Inertial Sensed Ego-motion for 3D Vision , 2004, J. Field Robotics.

[32]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[33]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[34]  E. Thompson A METHOD FOR THE CONSTRUCTION OF ORTHOGONAL MATRICES , 2006 .

[35]  Ian D. Reid,et al.  RSLAM: A System for Large-Scale Mapping in Constant-Time Using Stereo , 2011, International Journal of Computer Vision.

[36]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[38]  Michel Dhome,et al.  Determination of the Attitude of 3D Objects from a Single Perspective View , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[40]  J. Guiver,et al.  Multidimensional systems theory : progress, directions and open problems in multidimensional systems , 1985 .

[41]  Jan-Michael Frahm,et al.  A new minimal solution to the relative pose of a calibrated stereo camera with small field of view overlap , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Wolfram Burgard,et al.  A Tutorial on Graph-Based SLAM , 2010, IEEE Intelligent Transportation Systems Magazine.

[43]  David A. Cox,et al.  Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3/e (Undergraduate Texts in Mathematics) , 2007 .

[44]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[45]  Jonas Nygårds,et al.  C-SAM: Multi-Robot SLAM using square root information smoothing , 2008, 2008 IEEE International Conference on Robotics and Automation.

[46]  Roland Siegwart,et al.  Intuitive 3D Maps for MAV Terrain Exploration and Obstacle Avoidance , 2011, J. Intell. Robotic Syst..

[47]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[48]  Jan-Michael Frahm,et al.  Robust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems , 2008, 2008 IEEE Workshop on Applications of Computer Vision.

[49]  Jan-Michael Frahm,et al.  Visual Odometry for Non-overlapping Views Using Second-Order Cone Programming , 2007, ACCV.

[50]  Radu Horaud,et al.  Robot Hand-Eye Calibration Using Structure-from-Motion , 2001, Int. J. Robotics Res..

[51]  Zuzana Kukelova,et al.  Polynomial Eigenvalue Solutions to the 5-pt and 6-pt Relative Pose Problems , 2008, BMVC.

[52]  Takeo Kanade,et al.  Degeneracy of the Linear Seventeen-Point Algorithm for Generalized Essential Matrix , 2010, Journal of Mathematical Imaging and Vision.

[53]  Michel Dhome,et al.  3D reconstruction of complex structures with bundle adjustment: an incremental approach , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[54]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[55]  Illah R. Nourbakhsh,et al.  Techniques for evaluating optical flow for visual odometry in extreme terrain , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[56]  Sanjiv Singh,et al.  Online Motion Estimation from Image and Inertial Measurements , 2003 .

[57]  Rüdiger Gebauer,et al.  On an Installation of Buchberger's Algorithm , 1988, J. Symb. Comput..

[58]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[59]  Sebastian Thrun,et al.  Multi-robot SLAM with Sparse Extended Information Filers , 2003, ISRR.

[60]  Reinhard Koch,et al.  Calibration of a Multi-camera Rig from Non-overlapping Views , 2007, DAGM-Symposium.

[61]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[62]  Andrew Howard,et al.  Multi-robot Simultaneous Localization and Mapping using Particle Filters , 2005, Int. J. Robotics Res..

[63]  Roland Siegwart,et al.  Robust embedded egomotion estimation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Shree K. Nayar,et al.  A general imaging model and a method for finding its parameters , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[66]  Carlos Sagüés,et al.  Distributed multi-camera visual mapping using topological maps of planar regions , 2011, Pattern Recognit..

[67]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[68]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[70]  Gaurav S. Sukhatme,et al.  Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-calibration , 2011, Int. J. Robotics Res..

[71]  Roland Siegwart,et al.  SFly: Swarm of micro flying robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[72]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Reinhard Koch,et al.  Three-dimensional scene reconstruction from images , 2000, Electronic Imaging.

[74]  Kurt Konolige,et al.  Distributed Multirobot Exploration and Mapping , 2005, Proceedings of the IEEE.

[75]  Kostas Daniilidis,et al.  Linear Pose Estimation from Points or Lines , 2002, ECCV.

[76]  Oscar Firschein,et al.  Readings in computer vision: issues, problems, principles, and paradigms , 1987 .

[77]  Luc Van Gool,et al.  Generalised Linear Pose Estimation , 2007, BMVC.

[78]  Hans P. Moravec Obstacle avoidance and navigation in the real world by a seeing robot rover , 1980 .

[79]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[80]  Thierry Viéville,et al.  Computation of ego motion using the vertical cue , 2005, Machine Vision and Applications.

[81]  G. Volkart Das Rückwärtseinschneiden im Raum , 1933 .

[82]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Hassan Hajjdiab,et al.  Vision-based multi-robot simultaneous localization and mapping , 2004, First Canadian Conference on Computer and Robot Vision, 2004. Proceedings..

[84]  Bill Triggs,et al.  Camera pose and calibration from 4 or 5 known 3D points , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[85]  Amir Hashemi,et al.  A New Solution to the Relative Orientation Problem Using Only 3 Points and the Vertical Direction , 2009, Journal of Mathematical Imaging and Vision.

[86]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[87]  Long Quan,et al.  Linear N-Point Camera Pose Determination , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[88]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[89]  Carlo Traverso,et al.  “One sugar cube, please” or selection strategies in the Buchberger algorithm , 1991, ISSAC '91.

[90]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[91]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[92]  Kurt Konolige,et al.  Large-Scale Visual Odometry for Rough Terrain , 2007, ISRR.

[93]  Zuzana Kukelova,et al.  Automatic Generator of Minimal Problem Solvers , 2008, ECCV.

[94]  Robert M. Haralick,et al.  Analysis and solutions of the three point perspective pose estimation problem , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[95]  Axel Pinz,et al.  Globally Optimal O(n) Solution to the PnP Problem for General Camera Models , 2008, BMVC.

[96]  Jorge Dias,et al.  Relative Pose Calibration Between Visual and Inertial Sensors , 2007, Int. J. Robotics Res..

[97]  Michel Dhome,et al.  Hand-eye calibration , 1997, Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97.

[98]  Roland Siegwart,et al.  Visual-inertial SLAM for a small helicopter in large outdoor environments , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[99]  Kurt Konolige,et al.  Double window optimisation for constant time visual SLAM , 2011, 2011 International Conference on Computer Vision.

[100]  Andrew W. Fitzgibbon,et al.  Maintaining multiple motion model hypotheses over many views to recover matching and structure , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[101]  John J. Leonard,et al.  6-DOF Multi-session Visual SLAM using Anchor Nodes , 2011, ECMR.

[102]  Roland Siegwart,et al.  Real-time metric state estimation for modular vision-inertial systems , 2011, 2011 IEEE International Conference on Robotics and Automation.

[103]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[104]  Mongi A. Abidi,et al.  A New Efficient and Direct Solution for Pose Estimation Using Quadrangular Targets: Algorithm and Evaluation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  Michel Dhome,et al.  Generic and real-time structure from motion using local bundle adjustment , 2009, Image Vis. Comput..

[106]  Roland Siegwart,et al.  Real-time 6D stereo Visual Odometry with non-overlapping fields of view , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  David Harwood,et al.  Passive ranging to known planar point sets , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[108]  Roland Siegwart,et al.  Using multi-camera systems in robotics: Efficient solutions to the NPnP problem , 2013, 2013 IEEE International Conference on Robotics and Automation.

[109]  Christian Forster Collaborative Visual SLAM with Multiple MAVs , 2022 .

[110]  Paul D. Fiore,et al.  Efficient Linear Solution of Exterior Orientation , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[111]  Andrew Howard,et al.  Real-time stereo visual odometry for autonomous ground vehicles , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[112]  Nick Barnes,et al.  Estimating Relative Camera Motion from the Antipodal-Epipolar Constraint , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[114]  Andrew J. Davison,et al.  SLAM-based automatic extrinsic calibration of a multi-camera rig , 2011, 2011 IEEE International Conference on Robotics and Automation.

[115]  David Nistér A Minimal Solution to the Generalised 3-Point Pose Problem , 2004, CVPR.

[116]  Kyoung Mu Lee,et al.  Multi-robot SLAM using ceiling vision , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[117]  Roland Siegwart,et al.  Robust Real-Time Visual Odometry with a Single Camera and an IMU , 2011, BMVC.

[118]  Laurent Kneip,et al.  Binaural model for artificial spatial sound localization based on interaural time delays and movements of the interaural axis. , 2008, The Journal of the Acoustical Society of America.

[119]  Roland Siegwart,et al.  A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[120]  Carlos Sagüés,et al.  Consistent data association in multi-robot systems with limited communications , 2010, Robotics: Science and Systems.

[121]  Roland Siegwart,et al.  Closed-form solution for absolute scale velocity determination combining inertial measurements and a single feature correspondence , 2011, 2011 IEEE International Conference on Robotics and Automation.

[122]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[123]  Frank Dellaert,et al.  A multi-camera 6-DOF pose tracker , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[124]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[125]  Roland Siegwart,et al.  Characterization of the compact Hokuyo URG-04LX 2D laser range scanner , 2009, 2009 IEEE International Conference on Robotics and Automation.

[126]  Jan-Michael Frahm,et al.  Simple calibration of non-overlapping cameras with a mirror , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[127]  P. Gill,et al.  Algorithms for the Solution of the Nonlinear Least-Squares Problem , 1978 .

[128]  Jianliang Tang,et al.  Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[129]  Roland Siegwart,et al.  Deterministic initialization of metric state estimation filters for loosely-coupled monocular vision-inertial systems , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[130]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[131]  Sanjiv Singh,et al.  Motion Estimation from Image and Inertial Measurements , 2004, Int. J. Robotics Res..

[132]  P. Newman,et al.  Multiple Map Intersection Detection using Visual Appearance , 2005 .

[133]  Zuzana Kukelova,et al.  Closed-Form Solutions to Minimal Absolute Pose Problems with Known Vertical Direction , 2010, ACCV.

[134]  Stefano Soatto,et al.  Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[135]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[136]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[137]  Wen-Yan Chang,et al.  On pose recovery for generalized visual sensors , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[138]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[139]  Radu Horaud,et al.  An analytic solution for the perspective 4-point problem , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[140]  Roland Siegwart,et al.  Delay and Dropout Tolerant State Estimation for MAVs , 2010, ISER.

[141]  Kurt Konolige,et al.  Sparse Sparse Bundle Adjustment , 2010, BMVC.

[142]  Hongdong Li,et al.  Motion Estimation for Nonoverlapping Multicamera Rigs: Linear Algebraic and L∞ Geometric Solutions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[143]  Hauke Strasdat,et al.  Real-time monocular SLAM: Why filter? , 2010, 2010 IEEE International Conference on Robotics and Automation.

[144]  David Nister,et al.  Recent developments on direct relative orientation , 2006 .

[145]  Roland Siegwart,et al.  A Toolbox for Easily Calibrating Omnidirectional Cameras , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.