Recovering 6D Object Pose: A Review and Multi-modal Analysis

A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem.

[1]  Rama Chellappa,et al.  Fast object localization and pose estimation in heavy clutter for robotic bin picking , 2012, Int. J. Robotics Res..

[2]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[3]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[4]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[5]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Stepán Obdrzálek,et al.  On Evaluation of 6D Object Pose Estimation , 2016, ECCV Workshops.

[7]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Tae-Kyun Kim,et al.  Iterative Hough Forest with Histogram of Control Points for 6 DoF object registration from depth images , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Abhinav Gupta,et al.  Building Part-Based Object Detectors via 3D Geometry , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  Tae-Kyun Kim,et al.  Multi-view 6D Object Pose Estimation and Camera Motion Planning Using RGBD Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[21]  Vincent Lepetit,et al.  Going Further with Point Pair Features , 2016, ECCV.

[22]  Oliver Brock,et al.  Probabilistic multi-class segmentation for the Amazon Picking Challenge , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Tae-Kyun Kim,et al.  Pose Guided RGBD Feature Learning for 3D Object Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Vincent Lepetit,et al.  Hashmod: A Hashing Method for Scalable 3D Object Detection , 2016, BMVC.

[25]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Oliver Brock,et al.  Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, Robotics: Science and Systems.

[27]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[31]  Tae-Kyun Kim,et al.  A learning-based variable size part extraction architecture for 6D object pose recovery in depth images , 2017, Image Vis. Comput..

[32]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[33]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[34]  Eric Brachmann,et al.  Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[36]  Oliver Brock,et al.  Analysis and Observations From the First Amazon Picking Challenge , 2016, IEEE Transactions on Automation Science and Engineering.

[37]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[38]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[40]  Roberto Cipolla,et al.  Robust Instance Recognition in Presence of Occlusion and Clutter , 2014, ECCV.

[41]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[42]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Eric Brachmann,et al.  Global Hypothesis Generation for 6D Object Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Fei-Fei Li,et al.  Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going? , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..