Global Hypothesis Generation for 6D Object Pose Estimation

This paper addresses the task of estimating the 6D-pose of a known 3D object from a single RGB-D image. Most modern approaches solve this task in three steps: i) compute local features, ii) generate a pool of pose-hypotheses, iii) select and refine a pose from the pool. This work focuses on the second step. While all existing approaches generate the hypotheses pool via local reasoning, e.g. RANSAC or Hough-Voting, we are the first to show that global reasoning is beneficial at this stage. In particular, we formulate a novel fully-connected Conditional Random Field (CRF) that outputs a very small number of pose-hypotheses. Despite the potential functions of the CRF being non-Gaussian, we give a new, efficient two-step optimization procedure, with some guarantees for optimality. We utilize our global hypotheses generation procedure to produce results that exceed state-of-the-art for the challenging Occluded Object Dataset.

[1]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[2]  Vladimir Kolmogorov,et al.  Minimizing Nonsubmodular Functions with Graph Cuts-A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[4]  Christoph Schnörr,et al.  A Study of Parts-Based Object Class Detection Using Complete Graphs , 2010, International Journal of Computer Vision.

[5]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[6]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[7]  Kostas Daniilidis,et al.  Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery , 2016, Robotics: Science and Systems.

[8]  Siddhartha S. Srinivasa,et al.  MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Eric Brachmann,et al.  Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[13]  Vincent Lepetit,et al.  Going Further with Point Pair Features , 2016, ECCV.

[14]  Christopher Zach,et al.  A dynamic programming approach for fast and robust object pose recognition from range images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[16]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alexander Shekhovtsov,et al.  Maximum Persistency in Energy Minimization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Sebastian Nowozin,et al.  A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems , 2014, International Journal of Computer Vision.

[21]  Christoph Schnörr,et al.  Partial Optimality by Pruning for MAP-Inference with General Graphical Models , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[24]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Bogdan Savchynskyy,et al.  Maximum persistency via iterative relaxed inference with graphical models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Chen Wang,et al.  Relaxation-Based Preprocessing Techniques for Markov Random Field Inference , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[29]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[30]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Carsten Rother,et al.  A Study of Lagrangean Decompositions and Dual Ascent Solvers for Graph Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).