View-Invariant Loop Closure with Oriented Semantic Landmarks

Recent work on semantic simultaneous localization and mapping (SLAM) have shown the utility of natural objects as landmarks for improving localization accuracy and robustness. In this paper we present a monocular semantic SLAM system that uses object identity and inter-object geometry for view-invariant loop detection and drift correction. Our system's ability to recognize an area of the scene even under large changes in viewing direction allows it to surpass the mapping accuracy of ORB-SLAM, which uses only local appearance-based features that are not robust to large viewpoint changes. Experiments on real indoor scenes show that our method achieves mean drift reduction of 70% when compared directly to ORB-SLAM. Additionally, we propose a method for object orientation estimation, where we leverage the tracked pose of a moving camera under the SLAM setting to overcome ambiguities caused by object symmetry. This allows our SLAM system to produce geometrically detailed semantic maps with object orientation, translation, and scale.

[1]  Steven L. Waslander,et al.  Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leonidas J. Guibas,et al.  ObjectNet3D: A Large Scale Database for 3D Object Recognition , 2016, ECCV.

[3]  Sean L. Bowman,et al.  A Unifying View of Geometry, Semantics, and Data Association in SLAM , 2018, IJCAI.

[4]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[5]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[6]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Michael Milford,et al.  QuadricSLAM: Dual Quadrics From Object Detections as Landmarks in Object-Oriented SLAM , 2018, IEEE Robotics and Automation Letters.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  D.J. Kriegman,et al.  Generic models for robot navigation , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[11]  Cipriano Galindo,et al.  Multi-hierarchical semantic maps for mobile robotics , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[13]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[14]  Silvio Savarese,et al.  Semantic structure from motion with points, regions, and objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[16]  James J. Little,et al.  Explicit Occlusion Reasoning for 3D Object Detection , 2011, BMVC.

[17]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ian D. Reid,et al.  Real-Time Monocular Object-Model Aware Sparse SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Jimmy Li,et al.  Semantic Mapping for View-Invariant Relocalization , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[22]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[23]  Gregory Dudek,et al.  Synthetically Trained 3D Visual Tracker of Underwater Vehicles , 2018, OCEANS 2018 MTS/IEEE Charleston.

[24]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jonathan P. How,et al.  SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Ashutosh Saxena,et al.  Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[27]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[28]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[29]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[30]  Achim J. Lilienthal,et al.  SIFT, SURF and Seasons: Long-term Outdoor Localization Using Local Features , 2007, EMCR.

[31]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.