Observability Properties and Deterministic Algorithms in Visual-Inertial Structure from Motion

This paper discusses the visual-inertial structure from motion problem (VI-SfM problem) with special focus on the following three fundamental issues: observability properties, resolvability in closed form and data association. Regarding the first issue, after a discussion about the current state of the art, the paper investigates more complex scenarios. Specifically, with respect to the common formulation, which assumes three orthogonal accelerometers and three orthogonal gyroscopes, the analysis is extended to cope with the cases of a reduced number of inertial sensors and any number of point features observed by monocular vision. In particular, the minimal case of a single accelerometer, no gyroscope and a single point feature is addressed. Additionally, the analysis accounts for biased measurements and unknown extrinsic camera calibration. The results derived for these new and very challenging scenarios have interesting consequences both from a technological and neuroscientific perspective. Regarding the second issue, a simple closed form solution to the VI-SfM is presented. This solution expresses the structure of the scene and the motion only in terms of the visual and inertial measurements collected during a short time interval. This allows introducing deterministic algorithms able to simultaneously determine the structure of the scene together with the motion without the need for any initialization or prior knowledge. Additionally, the closed-form solution allows us to identify the conditions under which the VI-SfM has a finite number of solutions. Specifically, it is shown that the problem can have a unique solution, two distinct solutions or infinite solutions depending on the trajectory, on the number of point-features and on their arrangement in the 3D space and on the number of camera images. Finally, the paper discusses the third issue, i.e., the data association problem. Starting from basic results in computer vision, it is shown that, by exploiting the information provided by the inertial measurements, a single point correspondence (in the case of a planar motion) and two point correspondences (for a general 3D motion) are sufficient to characterize the motion between two camera poses. This allows us to use a 1-point RANSAC algorithm (in the planar case) or a 2-point RANSAC algorithm (in the general 3D case) to detect outliers. The paper concludes with some discussion about connections to related research fields both in the framework of computer science and neuroscience.

[1]  Markus Vincze,et al.  Fast Ego-motion Estimation with Multi-rate Fusion of Inertial and Vision , 2007, Int. J. Robotics Res..

[2]  Nathaniel E. Helwig,et al.  An Introduction to Linear Algebra , 2006 .

[3]  Kostas Daniilidis,et al.  Monocular visual odometry in urban environments using an omnidirectional camera , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Salah Sukkarieh,et al.  Real-time implementation of airborne inertial-SLAM , 2007, Robotics Auton. Syst..

[5]  Oliver J. Woodman,et al.  An introduction to inertial navigation , 2007 .

[6]  Stergios I. Roumeliotis,et al.  Vision-Aided Inertial Navigation for Spacecraft Entry, Descent, and Landing , 2009, IEEE Transactions on Robotics.

[7]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[8]  Stergios I. Roumeliotis,et al.  IMU-RGBD camera 3D pose estimation and extrinsic calibration: Observability analysis and consistency improvement , 2013, 2013 IEEE International Conference on Robotics and Automation.

[9]  Agostino Martinelli,et al.  Closed-form solution for attitude and speed determination by fusing monocular vision and inertial sensor measurements , 2011, 2011 IEEE International Conference on Robotics and Automation.

[10]  Jacques Droulez,et al.  Self-motion and the perception of stationary objects , 2001, Nature.

[11]  Rama Chellappa,et al.  Reduction of inherent ambiguities in structure from motion problem using inertial data , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[12]  Michael H. Smolensky,et al.  Editorial: Special Issue , 1987 .

[13]  William H. Press,et al.  Numerical recipes in C , 2002 .

[14]  S. Sukkarieh,et al.  Observability analysis and active control for airborne SLAM , 2008, IEEE Transactions on Aerospace and Electronic Systems.

[15]  Michael Veth,et al.  Fusing Low-Cost Image and Inertial Sensors for Passive Navigation , 2007 .

[16]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[17]  Jay A. Farrell,et al.  Aided Navigation: GPS with High Rate Sensors , 2008 .

[18]  Agostino Martinelli,et al.  Visual-inertial structure from motion: Observability and resolvability , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Agostino Martinelli,et al.  State Estimation Based on the Concept of Continuous Symmetry and Observability Analysis: The Case of Calibration , 2011, IEEE Transactions on Robotics.

[20]  Dora E Angelaki,et al.  Estimating distance during self-motion: a role for visual-vestibular interactions. , 2011, Journal of vision.

[21]  Andrew Zisserman,et al.  Multiple View Geometry , 1999 .

[22]  Stefano Soatto,et al.  Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Maxime Lhuillier,et al.  Automatic Structure and Motion using a Catadioptric Camera , 2005 .

[24]  Dimitrios G. Kottas,et al.  Towards Consistent Vision-Aided Inertial Navigation , 2012, WAFR.

[25]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Gaurav S. Sukhatme,et al.  Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-calibration , 2011, Int. J. Robotics Res..

[27]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[28]  David Nistér,et al.  Preemptive RANSAC for live structure and motion estimation , 2005, Machine Vision and Applications.

[29]  Anastasios I. Mourikis,et al.  Estimator initialization in vision-aided inertial navigation with unknown camera-IMU calibration , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Christian Laugier,et al.  1-Point-based monocular motion estimation for computationally-limited micro aerial vehicles , 2013, 2013 European Conference on Mobile Robots.

[31]  D M Merfeld,et al.  Humans use internal models to estimate gravity and linear acceleration , 1999, Nature.

[32]  H. C. Corben,et al.  Classical Mechanics (2nd ed.) , 1961 .

[33]  Peter Corke,et al.  An Introduction to Inertial and Visual Sensing , 2007, Int. J. Robotics Res..

[34]  Mingyang Li,et al.  Improving the accuracy of EKF-based visual-inertial odometry , 2012, 2012 IEEE International Conference on Robotics and Automation.

[35]  J. W. Humberston Classical mechanics , 1980, Nature.

[36]  Gilman E. S. Toombes,et al.  Preemptive RANSAC for Live Structure and Motion Estimation , 2003 .

[37]  Peter I. Corke,et al.  Editorial: Special Issue: 2nd Workshop on Integration of Vision and Inertial Sensors , 2007, Int. J. Robotics Res..

[38]  Stephan Weiss,et al.  Vision based navigation for micro helicopters , 2012 .

[39]  Maxime Lhuillier,et al.  Automatic scene structure and camera motion using a catadioptric system , 2008, Comput. Vis. Image Underst..

[40]  Stergios I. Roumeliotis,et al.  A Kalman Filter-Based Algorithm for IMU-Camera Calibration: Observability Analysis and Performance Evaluation , 2008, IEEE Transactions on Robotics.

[41]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[42]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Agostino Martinelli,et al.  Closed-Form Solution of Visual-Inertial Structure from Motion , 2013, International Journal of Computer Vision.

[44]  Markus Vincze,et al.  Simultaneous Motion and Structure Estimation by Fusion of Inertial and Vision Data , 2007, Int. J. Robotics Res..

[45]  Dimitrios G. Kottas,et al.  On the Consistency of Vision-Aided Inertial Navigation , 2012, ISER.

[46]  Davide Scaramuzza,et al.  1-Point-RANSAC Structure from Motion for Vehicle-Mounted Cameras by Exploiting Non-holonomic Constraints , 2011, International Journal of Computer Vision.

[47]  Christopher R Fetsch,et al.  Visual–vestibular cue integration for heading perception: applications of optimal cue integration theory , 2010, The European journal of neuroscience.

[48]  Salah Sukkarieh,et al.  Visual-Inertial-Aided Navigation for High-Dynamic Motion in Built Environments Without Initial Conditions , 2012, IEEE Transactions on Robotics.

[49]  Stefano Soatto,et al.  Visual-inertial navigation, mapping and localization: A scalable real-time causal approach , 2011, Int. J. Robotics Res..

[50]  Roland Siegwart,et al.  Monocular‐SLAM–based navigation for autonomous micro helicopters in GPS‐denied environments , 2011, J. Field Robotics.

[51]  Agostino Martinelli,et al.  Resolvability of Visual-Inertial Structure from Motion in Closed-form , 2012 .

[52]  G. DeAngelis,et al.  Vestibular Heading Discrimination and Sensitivity to Linear Acceleration in Head and World Coordinates , 2010, The Journal of Neuroscience.

[53]  A. Berthoz,et al.  Perception of linear horizontal self-motion induced by peripheral vision (linearvection) basic characteristics and visual-vestibular interactions , 1975, Experimental Brain Research.

[54]  Agostino Martinelli,et al.  Vision and IMU Data Fusion: Closed-Form Solutions for Attitude, Speed, Absolute Scale, and Bias Determination , 2012, IEEE Transactions on Robotics.

[55]  Milena Anguelova,et al.  Nonlinear observability and identifiability: General theory and a case study of a kinetic model for S. Cerevisiae , 2004 .

[56]  Gaurav S. Sukhatme,et al.  Visual-inertial simultaneous localization, mapping and sensor-to-sensor self-calibration , 2009, 2009 IEEE International Symposium on Computational Intelligence in Robotics and Automation - (CIRA).

[57]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Sanjiv Singh,et al.  Motion Estimation from Image and Inertial Measurements , 2004, Int. J. Robotics Res..

[59]  Salah Sukkarieh,et al.  Efficient integration of inertial observations into visual SLAM without initialization , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Dora E. Angelaki,et al.  Neurons compute internal models of the physical laws of motion , 2004, Nature.

[61]  A. Krener,et al.  Nonlinear controllability and observability , 1977 .

[62]  J. Droulez,et al.  The visual perception of three-dimensional shape from self-motion and object-motion , 1994, Vision Research.

[63]  Salah Sukkarieh,et al.  Removing scale biases and ambiguity from 6DoF monocular SLAM using inertial , 2008, 2008 IEEE International Conference on Robotics and Automation.