CubeSLAM: Monocular 3D Object Detection and SLAM without Prior Models

We present a method for single image 3D cuboid object detection and multi-view object SLAM without prior object model, and demonstrate that the two aspects can benefit each other. For 3D detection, we generate high quality cuboid proposals from 2D bounding boxes and vanishing points sampling. The proposals are further scored and selected to align with image edges. Experiments on SUN RGBD and KITTI shows the efficiency and accuracy over existing approaches. Then in the second part, multi-view bundle adjustment with novel measurement functions is proposed to jointly optimize camera poses, objects and points, utilizing single view detection results. Objects can provide more geometric constraints and scale consistency compared to points. On the collected and public TUM and KITTI odometry datasets, we achieve better pose estimation accuracy over the state-of-the-art monocular SLAM while also improve the 3D object detection accuracy at the same time.

[1]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shichao Yang,et al.  Direct monocular odometry using points and lines , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Jonathan P. How,et al.  SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[5]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[6]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shichao Yang,et al.  Pop-up SLAM: Semantic monocular plane SLAM for low-texture environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..

[11]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Alessio Del Bue,et al.  Probabilistic Structure from Motion with Objects (PSfMO) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[17]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[18]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[20]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[21]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Clark C. Guest,et al.  High Accuracy Monocular SFM and Scale Correction for Autonomous Driving , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Daniel D. Lee,et al.  Online self-supervised monocular visual odometry for ground vehicles , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Abhinav Gupta,et al.  Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Stefano Soatto,et al.  Visual-Inertial-Semantic Scene Representation for 3D Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Silvio Savarese,et al.  Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Olaf Kähler,et al.  Object-aware bundle adjustment for correcting monocular scale drift , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Jean-Bernard Hayet,et al.  Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Silvio Savarese,et al.  Semantic structure from motion with points, regions, and objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Michael Milford,et al.  Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).