Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

We propose a keypoint-based object-level SLAM framework that can provide globally consistent 6DoF pose estimates for symmetric and asymmetric objects alike. To the best of our knowledge, our system is among the first to utilize the camera pose information from SLAM to provide prior knowledge for tracking keypoints on symmetric objects - ensuring that new measurements are consistent with the current 3D scene. Moreover, our semantic key-point network is trained to predict the Gaussian covariance for the keypoints that captures the true error of the prediction, and thus is not only useful as a weight for the residuals in the system's optimization problems, but also as a means to detect harmful statistical outliers without choosing a manual threshold. Experiments show that our method provides competitive performance to the state of the art in 6DoF object pose estimation, and at a real-time speed. Our code, pre-trained models, and keypoint labels are available https://github.com/rpng/suo_slam.

[1]  John J. Leonard,et al.  A Multi-Hypothesis Approach to Pose Ambiguity in Object-Based SLAM , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Jan Czarnowski,et al.  CodeMapping: Real-Time Dense Mapping for Sparse SLAM using Compact Scene Representations , 2021, IEEE Robotics and Automation Letters.

[3]  Federico Tombari,et al.  GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  M. Pollefeys,et al.  CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Stergios I. Roumeliotis,et al.  Deep Multi-view Depth Estimation with Predicted Uncertainty , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Timothy Bretl,et al.  PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[7]  Eric Brachmann,et al.  BOP Challenge 2020 on 6D Object Localization , 2020, ECCV Workshops.

[8]  Mathieu Aubry,et al.  CosyPose: Consistent multi-view multi-object 6D pose estimation , 2020, ECCV.

[9]  Nikolay A. Atanasov,et al.  OrcVIO: Object residual constrained Visual-Inertial Odometry , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Wenxin Liu,et al.  TLIO: Tight Learned Inertial Odometry , 2020, IEEE Robotics and Automation Letters.

[11]  Nan Yang,et al.  D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Timothy Patten,et al.  Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Andrew Calway,et al.  Improving drone localisation around wind turbines using monocular model-based tracking , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[19]  Mikael Persson,et al.  Lambda Twist: An Accurate Fast Robust Perspective Three Point (P3P) Solver , 2018, ECCV.

[20]  Andrea Vedaldi,et al.  Supervising the New with the Old: Learning SFM from SFM , 2018, ECCV.

[21]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[22]  Eric Brachmann,et al.  BOP: Benchmark for 6D Object Pose Estimation , 2018, ECCV.

[23]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  K. Madhava Krishna,et al.  Constructing Category-Specific Models for Monocular Object-SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Zhen He,et al.  Numerical Coordinate Regression with Convolutional Neural Networks , 2018, ArXiv.

[26]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[27]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[31]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[34]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[36]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[37]  Siddhartha S. Srinivasa,et al.  Efficient multi-view object recognition and full pose estimation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[38]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.