A Segmentation-Driven Approach for 6D Object Pose Estimation in the Crowd

The task of estimating the 6D pose of object instances in the crowd (scenes with multiple object instances, severe foreground occlusions and background distractors), has been a research hotspot in recent years since it is very common in industrial applications. In this work, we present a segmentation-driven approach to recover the 6D object pose in the crowd. Firstly, a convolution neural network framework Mask R-CNN is applied to segment masks and bounding boxes of target object instances from the scene image in this stage. Then, the bounding boxes are segmented into smaller patches with slide windows. After that, a Sparse Auto Encoder is employed to extract invariant features of these patches, and we can obtain several candidate rough poses by Hough Voting. Finally, Iterative Closest Point (ICP) method is used to refine the 6D object pose for a better result. We tested our approach on the commonly used LINEMOD dataset [1]. Experimental results show that our approach achieves high accuracy and robustness under foreground occlusions and background distractors.

[1]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Federico Tombari,et al.  BOLD Features to Detect Texture-less Objects , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Manabu Hashimoto,et al.  Fast 6D Pose Estimation from a Monocular Image Using Hierarchical Pose Trees , 2016, ECCV.

[4]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[5]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[6]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[7]  Markus Ulrich,et al.  Introducing MVTec ITODD — A Dataset for 3D Object Recognition in Industry , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[8]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Manolis I. A. Lourakis,et al.  Detection and fine 3D pose estimation of texture-less objects in RGB-D images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Chyi-Yeu Lin,et al.  6D pose estimation using an improved method based on point pair features , 2018, 2018 4th International Conference on Control, Automation and Robotics (ICCAR).

[11]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Gunilla Borgefors,et al.  Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Markus Ulrich,et al.  Combining Scale-Space and Similarity-Based Aspect Graphs for Fast 3D Object Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[18]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Nico Blodow,et al.  CAD-model recognition and 6DOF pose estimation using 3D cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[21]  Eric Brachmann,et al.  Learning 6 D Object Pose Estimation using 3 D Object Coordinates-Supplementary Material - , 2014 .

[22]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[24]  Dirk Kraft,et al.  Rotational Subgroup Voting and Pose Clustering for Robust 3D Object Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.