ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation

We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, the dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions among the landmarks extracted from the same rigid body for clustering and estimating static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix upon landmarks, and uses agglomerative clustering for distinguishing those rigid bodies. Accompanied by a decoupled factor graph optimization for revising their shape and trajectory, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneous tracking of ego-motion and multiple objects.

[1]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[2]  Lourdes Agapito,et al.  Co-fusion: Real-time segmentation, tracking and fusion of multiple objects , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[4]  Javier Civera,et al.  DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[5]  Truong Q. Nguyen,et al.  Visual odometry for RGB-D cameras for dynamic scenes , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Daniel Cremers,et al.  Fast odometry and scene flow from RGB-D cameras based on geometric clustering , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Supun Samarasekera,et al.  Utilizing semantic visual landmarks for precise vehicle navigation , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[8]  Ralph R. Martin,et al.  Robust tracking-by-detection using a selection and completion mechanism , 2017, Computational Visual Media.

[9]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[10]  Lizhuang Ma,et al.  Multi-Exposure Motion Estimation Based on Deep Convolutional Networks , 2018, Journal of Computer Science and Technology.

[11]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[12]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jie Zhao,et al.  A review of moving object trajectory clustering algorithms , 2016, Artificial Intelligence Review.

[14]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yukinori Kobayashi,et al.  Path Smoothing Techniques in Robot Navigation: State-of-the-Art, Current and Future Challenges , 2018, Sensors.

[16]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[17]  René Vidal,et al.  Projective Factorization of Multiple Rigid-Body Motions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Binbin Xu,et al.  MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Frank Dellaert,et al.  Selecting good measurements via ℓ1 relaxation: A convex approach for robust estimation over graphs , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xun Xu,et al.  Motion Segmentation by Exploiting Complementary Geometric Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Zhilei Liu,et al.  Prior-Free Dependent Motion Segmentation Using Helmholtz-Hodge Decomposition Based Object-Motion Oriented Map , 2017, Journal of Computer Science and Technology.

[23]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[24]  René Vidal,et al.  Three-View Multibody Structure from Motion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Shi-Min Hu,et al.  Real-time High-accuracy Three-Dimensional Reconstruction with Consumer RGB-D Cameras , 2018, ACM Trans. Graph..

[26]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[27]  Muhamad Risqi U. Saputra,et al.  Visual SLAM and Structure from Motion in Dynamic Environments , 2018, ACM Comput. Surv..

[28]  Julius Ziegler,et al.  Sparse scene flow segmentation for moving object detection in urban environments , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[29]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[30]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[31]  Yuri Boykov,et al.  Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[32]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[33]  N. Dinesh Reddy,et al.  CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Lourdes Agapito,et al.  MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[35]  Wolfram Burgard,et al.  Robust map optimization using dynamic covariance scaling , 2013, 2013 IEEE International Conference on Robotics and Automation.

[36]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[37]  S. Shankar Sastry,et al.  Two-View Multibody Structure from Motion , 2005, International Journal of Computer Vision.

[38]  Jong-Hwan Kim,et al.  Effective Background Model-Based RGB-D Dense Visual Odometry in a Dynamic Environment , 2016, IEEE Transactions on Robotics.

[39]  John J. Leonard,et al.  Communication-constrained multi-AUV cooperative SLAM , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[41]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[42]  Luis Miguel Bergasa,et al.  On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[43]  K. Madhava Krishna,et al.  Realtime multibody visual SLAM with a smoothly moving monocular camera , 2011, 2011 International Conference on Computer Vision.

[44]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[46]  Paul Newman,et al.  Multimotion Visual Odometry (MVO): Simultaneous Estimation of Camera and Third-Party Motions , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Marc Pollefeys,et al.  Robust Dense Mapping for Large-Scale Dynamic Environments , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).