Fast Uncertainty Quantification for Deep Object Pose Estimation

Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer. Efficient and robust uncertainty quantification (UQ) in pose estimators is critically needed in many robotic tasks. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation. We ensemble 2–3 pre-trained models with different neural network architectures and/or training data sources, and compute their average pair-wise disagreement against one another to obtain the uncertainty quantification. We propose four disagreement metrics, including a learned metric, and show that the average distance (ADD) is the best learning-free metric and it is only slightly worse than the learned metric, which requires labeled target data. Our method has several advantages compared to the prior art: 1) our method does not require any modification of the training process or the model inputs; and 2) it needs only one forward pass for each model. We evaluate the proposed UQ method on three tasks where our uncertainty quantification yields much stronger correlations with pose estimation errors than the baselines. Moreover, in a real robot grasping task, our method increases the grasping success rate from 35% to 90%. Video and code are available at https://sites.google.com/view/fastuq.

[1]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[2]  Oliver Kroemer,et al.  Camera-to-Robot Pose Estimation from a Single Image , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[4]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[6]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Slobodan Ilic,et al.  DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[8]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[10]  Jungwon Seo,et al.  Shallow-Depth Insertion: Peg in Shallow Hole Through Robotic In-Hand Manipulation , 2019, IEEE Robotics and Automation Letters.

[11]  Roland Siegwart,et al.  The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation , 2019, International Journal of Computer Vision.

[12]  Jan Hauke,et al.  Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .

[13]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Sim2real transfer learning for 3D human pose estimation: motion to the rescue , 2019, NeurIPS.

[15]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Nassir Navab,et al.  6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality , 2017, ArXiv.

[17]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[18]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Gang Yu,et al.  Rethinking on Multi-Stage Networks for Human Pose Estimation , 2019, ArXiv.

[20]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Timothy Bretl,et al.  PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[22]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[23]  Dieter Fox,et al.  Human Grasp Classification for Reactive Human-to-Robot Handovers , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[25]  Masayoshi Tomizuka,et al.  Robot Safe Interaction System for Intelligent Industrial Co-Robots , 2018, ArXiv.

[26]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[27]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[28]  Davide Scaramuzza,et al.  A General Framework for Uncertainty Estimation in Deep Learning , 2020, IEEE Robotics and Automation Letters.

[29]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pascal Fua,et al.  Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dieter Fox,et al.  DexPilot: Vision-Based Teleoperation of Dexterous Robotic Hand-Arm System , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Ken Goldberg,et al.  A Fog Robotic System for Dynamic Visual Servoing , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Dieter Fox,et al.  Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Rui Chen,et al.  GRIP: Generative Robust Inference and Perception for Semantic Robot Manipulation in Adversarial Environments , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Jingpei Lu,et al.  Robust Keypoint Detection and Pose Estimation of Robot Manipulators with Self-Occlusions via Sim-to-Real Transfer , 2020, ArXiv.

[36]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[37]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[38]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[40]  Luc Van Gool,et al.  Real-time 3D Traffic Cone Detection for Autonomous Driving , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[41]  Jianliang Tang,et al.  Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[43]  Stephen Tyree,et al.  Synthetically Trained Neural Networks for Learning Human-Readable Plans from Real-World Demonstrations , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).