Multi-body Motion Estimation from Monocular Vehicle-Mounted Cameras

This paper addresses the problem of simultaneous estimation of a vehicle's ego motion and motions of multiple moving objects in the scene-called eoru motions-through a monocular vehicle-mounted camera. Localization of multiple moving objects and estimation of their motions is crucial for autonomous vehicles. Conventional localization and mapping techniques (e.g., visual odometry and simultaneous localization and mapping) can only estimate the ego motion of the vehicle. The capability of a robot localization pipeline to deal with multiple motions has not been widely investigated in the literature. We present a theoretical framework for robust estimation of multiple relative motions in addition to the camera ego motion. First, the framework for general unconstrained motion is introduced and then it is adapted to exploit the vehicle kinematic constraints to increase efficiency. The method is based on projective factorization of the multiple-trajectory matrix. First, the ego motion is segmented and then several hypotheses are generated for the eoru motions. All the hypotheses are evaluated and the one with the smallest reprojection error is selected. The proposed framework does not need any a priori knowledge of the number of motions and is robust to noisy image measurements. The method with a constrained motion model is evaluated on a popular street-level image dataset collected in urban environments (the KITTI dataset), including several relative ego-motion and eoru-motion scenarios. A benchmark dataset (Hopkins 155) is used to evaluate this method with a general motion model. The results are compared with those of the state-of-the-art methods considering a similar problem, referred to as multibody structure from motion in the computer vision community.

[1]  Laurent Kneip,et al.  Collaborative monocular SLAM with multiple Micro Aerial Vehicles , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[3]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Hongdong Li,et al.  Robust Motion Segmentation with Unknown Correspondences , 2014, ECCV.

[5]  Peter F. Sturm,et al.  A Factorization Based Algorithm for Multi-Image Projective Structure and Motion , 1996, ECCV.

[6]  S. Shankar Sastry,et al.  Two-View Multibody Structure from Motion , 2005, International Journal of Computer Vision.

[7]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[8]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[9]  Roland Siegwart,et al.  Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Guangming Xiong,et al.  ICP stereo visual odometry for wheeled vehicles based on a 1DOF motion prior , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Takeo Kanade,et al.  A Head-Wearable Short-Baseline Stereo System for the Simultaneous Estimation of Structure and Motion , 2011, MVA.

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Davide Scaramuzza,et al.  Performance evaluation of 1‐point‐RANSAC visual odometry , 2011, J. Field Robotics.

[15]  René Vidal,et al.  Three-View Multibody Structure from Motion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  René Vidal,et al.  Projective Factorization of Multiple Rigid-Body Motions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Konrad Schindler,et al.  Perspective n-View Multibody Structure-and-Motion Through Model Selection , 2006, ECCV.

[18]  René Vidal,et al.  Segmenting Motions of Different Types by Unsupervised Manifold Clustering , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David Pfeiffer,et al.  Modeling Dynamic 3D Environments by Means of The Stixel World , 2011, IEEE Intelligent Transportation Systems Magazine.

[20]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[21]  Noboru Babaguchi,et al.  Depth-Estimation-Free Condition for Projective Factorization and Its Application to 3D Reconstruction , 2012, ACCV.

[22]  Konrad Schindler,et al.  View-Consistent 3D Scene Flow Estimation over Multiple Frames , 2014, ECCV.

[23]  Kenichi Kanatani,et al.  Geometric Structure of Degeneracy for Multi-body Motion Segmentation , 2004, ECCV Workshop SMVP.

[24]  Richard I. Hartley,et al.  Iterative Extensions of the Sturm/Triggs Algorithm: Convergence and Nonconvergence , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Davide Scaramuzza,et al.  1-Point-RANSAC Structure from Motion for Vehicle-Mounted Cameras by Exploiting Non-holonomic Constraints , 2011, International Journal of Computer Vision.

[26]  F. Fraundorfer,et al.  Visual Odometry : Part II: Matching, Robustness, Optimization, and Applications , 2012, IEEE Robotics & Automation Magazine.

[27]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[28]  Alessio Del Bue,et al.  Joint estimation of segmentation and structure from motion , 2013, Comput. Vis. Image Underst..

[29]  Reinhard Koch,et al.  Dense 3D Motion Field Estimation from a Moving Observer in Real Time , 2014, Smart Mobile In-Vehicle Systems.

[30]  B. S. Manjunath,et al.  The multiRANSAC algorithm and its application to detect planar homographies , 2005, IEEE International Conference on Image Processing 2005.

[31]  Allen Y. Yang,et al.  Robust Statistical Estimation and Segmentation of Multiple Subspaces , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[32]  Davide Scaramuzza,et al.  Monocular simultaneous multi-body motion segmentation and reconstruction from perspective views , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Roland Siegwart,et al.  Introduction to Autonomous Mobile Robots, Second Edition , 2011, Intelligent robotics and autonomous agents.

[34]  Dariu Gavrila,et al.  A Multilevel Mixture-of-Experts Framework for Pedestrian Classification , 2011, IEEE Transactions on Image Processing.

[35]  K. Madhava Krishna,et al.  Realtime multibody visual SLAM with a smoothly moving monocular camera , 2011, 2011 International Conference on Computer Vision.

[36]  J. M. M. Montiel,et al.  Indoor robot motion based on monocular images , 2001, Robotica.

[37]  Roland Siegwart,et al.  Introduction to Autonomous Mobile Robots , 2004 .

[38]  René Vidal,et al.  Clustering disjoint subspaces via sparse representation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  René Vidal,et al.  A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Allen Y. Yang,et al.  Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data , 2008, SIAM Rev..

[42]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[43]  Davide Scaramuzza,et al.  Exploiting motion priors in visual odometry for vehicle-mounted cameras with non-holonomic constraints , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Gilad Lerman,et al.  Hybrid Linear Modeling via Local Best-Fit Flats , 2010, International Journal of Computer Vision.

[45]  P. A. Simionescu,et al.  Optimum synthesis of the four-bar function generator in its symmetric embodiment: the Ackermann steering linkage , 2002 .

[46]  René Vidal Multi-Subspace Methods for Motion Segmentation from Affine, Perspective and Central Panoramic Cameras , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.