Not Only Look But Infer: Multiple Hypothesis Clustering of Data Association Inference for Semantic SLAM

In semantic visual simultaneous localization and mapping (VSLAM), accurate data association of semantic measurement from visual sensor is very crucial for robot state estimation and scene reconstruction. However, most of the related works assume a simple world for semantic association. It is still a challenge to deal with the ambiguity of data association in a cluttered environment. In this article, we propose a novel approach to reduce the uncertainty of data association via multiple hypothesis Dirichlet process (MHDP). The posterior distribution of data association is inferred by Dirichlet process (DP) first. Ambiguous associations from the distribution are tackled by a hypothesis tree, and hypothesis testing-based ambiguity judgment is then proposed for each object measurement to provide a strategy for branch growing of the hypothesis tree. Moreover, the proposed data association approach is integrated with a geometric featured-based simultaneous localization and mapping (SLAM) system in a tightly coupled way. The qualitative and quantitative evaluation on simulated and real-world public data sets demonstrates the robustness and effectiveness of our approach compared to other data association methods and the state-of-the-art SLAM system.

[1]  Roland Siegwart,et al.  Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery , 2019, IEEE Robotics and Automation Letters.

[2]  Dieter Fox,et al.  DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks , 2017, Robotics: Science and Systems.

[3]  Jonathan P. How,et al.  SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Yipu Zhao,et al.  Good Feature Matching: Toward Accurate, Robust VO/VSLAM With Low Latency , 2020, IEEE Transactions on Robotics.

[5]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Michael Milford,et al.  QuadricSLAM: Dual Quadrics From Object Detections as Landmarks in Object-Oriented SLAM , 2018, IEEE Robotics and Automation Letters.

[7]  Peter Protzel,et al.  Factor Graph based 3D Multi-Object Tracking in Point Clouds , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Yunzhou Zhang,et al.  EAO-SLAM: Monocular Semi-Dense Object SLAM Based on Ensemble Data Association , 2020, ArXiv.

[9]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..

[11]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[12]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[13]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Ying Wang,et al.  Monocular Visual Odometry Based on Depth and Optical Flow Using Deep Learning , 2021, IEEE Transactions on Instrumentation and Measurement.

[15]  Dehann Fourie,et al.  Multimodal Semantic SLAM with Probabilistic Data Association , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Carlo S. Regazzoni,et al.  Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video , 2016, IEEE Transactions on Image Processing.

[18]  Hannes Sommer,et al.  Multiple Hypothesis Semantic Mapping for Robust Data Association , 2019, IEEE Robotics and Automation Letters.

[19]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[20]  Shichao Yang,et al.  CubeSLAM: Monocular 3-D Object SLAM , 2018, IEEE Transactions on Robotics.

[21]  George J. Pappas,et al.  Semantic Localization Via the Matrix Permanent , 2014, Robotics: Science and Systems.

[22]  Zhengcai Cao,et al.  Robust Neuro-Optimal Control of Underactuated Snake Robots With Experience Replay , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Peter X. Liu,et al.  Moving Object Segmentation and Detection for Robust RGBD-SLAM in Dynamic Environments , 2021, IEEE Transactions on Instrumentation and Measurement.

[24]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[25]  David Baxter,et al.  Probabilistic Data Association via Mixture Models for Robust Semantic SLAM , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[27]  Odest Chadwicke Jenkins,et al.  GeoFusion: Geometric Consistency Informed Scene Estimation in Dense Clutter , 2020, IEEE Robotics and Automation Letters.

[28]  Qichao Wang,et al.  Hierarchical Topic Model Based Object Association for Semantic SLAM , 2019, IEEE Transactions on Visualization and Computer Graphics.

[29]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[30]  Yuan F. Zheng,et al.  Object Tracking With Particle Filtering in Fluorescence Microscopy Images: Application to the Motion of Neurofilaments in Axons , 2012, IEEE Transactions on Medical Imaging.

[31]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[32]  Jonathan P. How,et al.  Efficient Constellation-Based Map-Merging for Semantic SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Ingemar J. Cox,et al.  An Efficient Implementation of Reid's Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[35]  Ian D. Reid,et al.  Real-Time Monocular Object-Model Aware Sparse SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[36]  Peng Zhang,et al.  Unsupervised PolSAR Image Classification and Segmentation Using Dirichlet Process Mixture Model and Markov Random Fields With Similarity Measure , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[37]  Yahya Zweiri,et al.  A Stacked LSTM-Based Approach for Reducing Semantic Pose Estimation Error , 2021, IEEE Transactions on Instrumentation and Measurement.

[38]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.