Consensus-Informed Optimization Over Mixtures for Ambiguity-Aware Object SLAM

Building object-level maps can facilitate robot-environment interactions (e.g. planning and manipulation), but objects could often have multiple probable poses when viewed from a single vantage point, due to symmetry, occlusion or perceptual failures. A robust object-level simultaneous localization and mapping (object SLAM) algorithm needs to be aware of this pose ambiguity. We propose to maintain and subsequently disambiguate the multiple pose interpretations to gradually recover a globally consistent world representation. The max-mixtures model is applied to implicitly and efficiently track all pose hypotheses, but the resulting formulation is non-convex, and therefore subject to local optima. To mitigate this problem, temporally consistent hypotheses are extracted, guiding the optimization into the global optimum. This consensus-informed inference method is applied online via landmark variable re-initialization within an incremental SLAM framework, iSAM2, for robust real-time performance. We demonstrate that this approach improves SLAM performance on both simulated and real object SLAM problems with pose ambiguity.

[1]  Sanja Fidler,et al.  Pose Estimation for Objects with Rotational Symmetry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  P. Abbeel,et al.  Yale-CMU-Berkeley dataset for robotic manipulation research , 2017, Int. J. Robotics Res..

[3]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Edwin Olson,et al.  Inference on networks of mixtures for robust robot mapping , 2013, Int. J. Robotics Res..

[5]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[6]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Bjarne Großmann,et al.  Fast view-based pose estimation of industrial objects in point clouds using a particle filter with an ICP-based motion model , 2017, 2017 IEEE 15th International Conference on Industrial Informatics (INDIN).

[8]  Andrew Davison,et al.  NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction , 2020, 2020 International Conference on 3D Vision (3DV).

[9]  Michael Kaess,et al.  MH-iSAM2: Multi-hypothesis iSAM using Bayes Tree and Hypo-tree , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[10]  F. Dellaert Factor Graphs and GTSAM: A Hands-on Introduction , 2012 .

[11]  Y. Bar-Shalom Tracking and data association , 1988 .

[12]  Nassir Navab,et al.  Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Kostas E. Bekris,et al.  se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Timothy Bretl,et al.  PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[16]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[17]  Edwin Olson,et al.  Robust pose graph optimization using stochastic gradient descent , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Zoltan-Csaba Marton,et al.  Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection , 2019, International Journal of Computer Vision.

[19]  T. R. Kronhamn,et al.  Bearings-only target motion analysis based on a multihypothesis Kalman filter and adaptive ownship motion control , 1998 .

[20]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Zoltan-Csaba Marton,et al.  Improving object orientation estimates by considering multiple viewpoints , 2018, Auton. Robots.

[22]  Brian Okorn,et al.  Learning Orientation Distributions for Object Pose Estimation , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[25]  Stergios I. Roumeliotis,et al.  Analytically-selected multi-hypothesis incremental MAP estimation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Paul Timothy Furgale,et al.  Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems , 2014, IEEE Transactions on Robotics.

[27]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..