Probabilistic data association for semantic SLAM

Traditional approaches to simultaneous localization and mapping (SLAM) rely on low-level geometric features such as points, lines, and planes. They are unable to assign semantic labels to landmarks observed in the environment. Furthermore, loop closure recognition based on low-level features is often viewpoint-dependent and subject to failure in ambiguous or repetitive environments. On the other hand, object recognition methods can infer landmark classes and scales, resulting in a small set of easily recognizable landmarks, ideal for view-independent unambiguous loop closure. In a map with several objects of the same class, however, a crucial data association problem exists. While data association and recognition are discrete problems usually solved using discrete inference, classical SLAM is a continuous optimization over metric information. In this paper, we formulate an optimization problem over sensor states and semantic landmark positions that integrates metric information, semantic information, and data associations, and decompose it into two interconnected problems: an estimation of discrete data association and landmark class probabilities, and a continuous optimization over the metric states. The estimated landmark and robot poses affect the association and class distributions, which in turn affect the robot-landmark pose optimization. The performance of our algorithm is demonstrated on indoor and outdoor datasets.

[1]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[2]  Evangelos E. Milios,et al.  Globally Consistent Range Scan Alignment for Environment Mapping , 1997, Auton. Robots.

[3]  Juan D. Tardós,et al.  Data association in stochastic mapping using the joint compatibility test , 2001, IEEE Trans. Robotics Autom..

[4]  Cipriano Galindo,et al.  Multi-hierarchical semantic maps for mobile robotics , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[6]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[10]  Andrzej Pronobis,et al.  Semantic Mapping with Mobile Robots , 2011 .

[11]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[13]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[14]  Silvio Savarese,et al.  Semantic structure from motion , 2011, CVPR 2011.

[15]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[16]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[17]  F. Dellaert Factor Graphs and GTSAM: A Hands-on Introduction , 2012 .

[18]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  François Fleuret,et al.  Deformable Part Models with Individual Part Scaling , 2013, BMVC.

[20]  Dimitrios G. Kottas,et al.  Efficient and consistent vision-aided inertial navigation using line observations , 2013, 2013 IEEE International Conference on Robotics and Automation.

[21]  Dimitrios G. Kottas,et al.  Detecting and dealing with hovering maneuvers in vision-aided inertial navigation systems , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[24]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Song-Chun Zhu,et al.  Single-View 3D Scene Parsing by Attributed Grammar , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  George J. Pappas,et al.  Active Deformable Part Models Inference , 2014, ECCV.

[27]  Ian D. Reid Towards semantic visual SLAM , 2014, 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV).

[28]  George J. Pappas,et al.  Semantic Localization Via the Matrix Permanent , 2014, Robotics: Science and Systems.

[29]  Roland Siegwart,et al.  A synchronized visual-inertial sensor system with FPGA pre-processing for accurate real-time SLAM , 2014, ICRA 2014.

[30]  Dimitrios G. Kottas,et al.  Consistency Analysis and Improvement of Vision-aided Inertial Navigation , 2014, IEEE Transactions on Robotics.

[31]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[32]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[33]  Frank Dellaert,et al.  IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation , 2015, Robotics: Science and Systems.

[34]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[37]  Antonios Gasteratos,et al.  Semantic mapping for mobile robotics tasks: A survey , 2015, Robotics Auton. Syst..

[38]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[39]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Roland Siegwart,et al.  Robust visual inertial odometry using a direct EKF-based approach , 2015, IROS 2015.

[41]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..

[44]  George J. Pappas,et al.  Localization from semantic observations via the matrix permanent , 2016, Int. J. Robotics Res..

[45]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[46]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.