A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analyzed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

[1]  Longin Jan Latecki,et al.  Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Tae-Kyun Kim,et al.  Pose Guided RGBD Feature Learning for 3D Object Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Sergio Garrido-Jurado,et al.  Procedurally generated virtual reality from 3D reconstructed physical space , 2016, VRST.

[4]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..

[5]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[6]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[7]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[8]  W. Marsden I and J , 2012 .

[9]  Xiangyang Ji,et al.  CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Guillermo Garcia-Hernando,et al.  Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning , 2019, ArXiv.

[12]  Maic Masuch,et al.  Increasing Presence in a Mixed Reality Application by Integrating a Real Time Tracked Full Body Representation , 2017, ACE.

[13]  Yinda Zhang,et al.  DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[15]  Stepán Obdrzálek,et al.  On Evaluation of 6D Object Pose Estimation , 2016, ECCV Workshops.

[16]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Shiyu Song,et al.  Joint SFM and detection cues for monocular 3D localization in road scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jonathan P. How,et al.  SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Konrad Schindler,et al.  Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Eric Brachmann,et al.  BOP: Benchmark for 6D Object Pose Estimation , 2018, ECCV.

[25]  Marcelo H. Ang,et al.  A General Pipeline for 3D Detection of Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Michael Greenspan,et al.  Generalized 4-Points Congruent Sets for 3D Registration , 2014, 2014 2nd International Conference on 3D Vision.

[28]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Gregory D. Hager,et al.  Hierarchical semantic parsing for object pose estimation in densely cluttered scenes , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Zoltan-Csaba Marton,et al.  Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection , 2019, International Journal of Computer Vision.

[32]  Uwa † Nicta Separating Objects and Clutter in Indoor Scenes , 2015 .

[33]  Mustafa Unel,et al.  Under vehicle perception for high level safety measures using a catadioptric camera system , 2013, IECON 2013 - 39th Annual Conference of the IEEE Industrial Electronics Society.

[34]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[35]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[37]  Henrik I. Christensen,et al.  RGB-D object pose estimation in unstructured environments , 2016, Robotics Auton. Syst..

[38]  Alexandre Bernardino,et al.  Fast 3D Object Recognition of Rotationally Symmetric Objects , 2013, IbPRIA.

[39]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[40]  Sven Behnke,et al.  RGB-D object detection and semantic segmentation for autonomous manipulation in clutter , 2018, Int. J. Robotics Res..

[41]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Bin Yang,et al.  Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  David Lindlbauer,et al.  Remixed Reality: Manipulating Space and Time in Augmented Reality , 2018, CHI.

[44]  K. Madhava Krishna,et al.  Constructing Category-Specific Models for Monocular Object-SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Slobodan Ilic,et al.  PPFNet: Global Context Aware Local Features for Robust 3D Point Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Miguel A. Olivares-Méndez,et al.  3D pose estimation based on planar object tracking for UAVs control , 2010, 2010 IEEE International Conference on Robotics and Automation.

[48]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[50]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Slobodan Ilic,et al.  PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors , 2018, ECCV.

[52]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Nikolaus Correll,et al.  Improving grasp performance using in-hand proximity and contact sensing , 2016, ArXiv.

[55]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[56]  Xavier Lladó,et al.  A Method for 6D Pose Estimation of Free-Form Rigid Objects Using Point Pair Features on Range Data , 2018, Sensors.

[57]  Tae-Kyun Kim,et al.  Category-level 6D Object Pose Recovery in Depth Images , 2018, ECCV Workshops.

[58]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[62]  Ali Mazalek,et al.  Tangible VR: Diegetic Tangible Objects for Virtual Reality Narratives , 2017, Conference on Designing Interactive Systems.

[63]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Roland Siegwart,et al.  Vision based MAV navigation in unknown and unstructured environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[65]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[68]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[69]  Vincent Lepetit,et al.  Going Further with Point Pair Features , 2016, ECCV.

[70]  Kostas Daniilidis,et al.  Single image 3D object detection and pose estimation for grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[71]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Henrik I. Christensen,et al.  RGB-D edge detection and edge-based registration , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[75]  Tae-Kyun Kim,et al.  Iterative Hough Forest with Histogram of Control Points for 6 DoF object registration from depth images , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[76]  Xiaogang Wang,et al.  GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[79]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[81]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[82]  Kostas E. Bekris,et al.  A Dataset for Improved RGBD-Based Object Detection and Pose Estimation for Warehouse Pick-and-Place , 2015, IEEE Robotics and Automation Letters.

[83]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[86]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[87]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[88]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[89]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Mandy Eberhart,et al.  Decision Forests For Computer Vision And Medical Image Analysis , 2016 .

[91]  Daniel J. Wigdor,et al.  Annexing Reality: Enabling Opportunistic Use of Everyday Objects as Tangible Proxies in Augmented Reality , 2016, CHI.

[92]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[93]  Jianxiong Xiao,et al.  A Linear Approach to Matching Cuboids in RGBD Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Slobodan Ilic,et al.  3D object instance recognition and pose estimation using triplet loss with dynamic margin , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[95]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[96]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[97]  Rama Chellappa,et al.  Fast object localization and pose estimation in heavy clutter for robotic bin picking , 2012, Int. J. Robotics Res..

[98]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[99]  Guillermo Garcia-Hernando,et al.  Instance- and Category-level 6D Object Pose Estimation , 2019, RGB-D Image Analysis and Processing.

[100]  Zhao Yi,et al.  A Robust Robot Design for Item Picking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[101]  Ming-Yu Liu,et al.  Voting-based pose estimation for robotic assembly using a 3D sensor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[102]  Stefan Leutenegger,et al.  Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[103]  Justus H. Piater,et al.  Integration of Probabilistic Pose Estimates from Multiple Views , 2016, ECCV.

[104]  Eric Brachmann,et al.  Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[105]  Tae-Kyun Kim,et al.  Randomised Manifold Forests for Principal Angle-Based Face Recognition , 2010, ACCV.

[106]  Alessio Del Bue,et al.  Fast 6D pose estimation for texture-less objects from a single RGB image , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[107]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[108]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[109]  Tanja Schultz,et al.  Combined intention, activity, and motion recognition for a humanoid household robot , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[110]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[111]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[112]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[113]  Vincent Lepetit,et al.  A Summary of the 4th International Workshop on Recovering 6D Object Pose , 2018, ECCV Workshops.

[114]  Tae-Kyun Kim,et al.  Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[115]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[116]  Zhen Liu,et al.  Multi Robot Object-Based SLAM , 2016, ISER.

[117]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[118]  Jizhong Xiao,et al.  An open-source pose estimation system for micro-air vehicles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[119]  Oliver Brock,et al.  Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, Robotics: Science and Systems.

[120]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[121]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122]  Manolis I. A. Lourakis,et al.  Detection and fine 3D pose estimation of texture-less objects in RGB-D images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[123]  Henrik I. Christensen,et al.  3D pose estimation of daily objects using an RGB-D camera , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[124]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[125]  Tae-Kyun Kim,et al.  A learning-based variable size part extraction architecture for 6D object pose recovery in depth images , 2017, Image Vis. Comput..

[126]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[127]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[128]  Timothy Patten,et al.  Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[129]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[130]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[131]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[132]  Gérard G. Medioni,et al.  3D object recognition in range images using visibility context , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[133]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[134]  James L. Crowley,et al.  Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[135]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[136]  Erik B. Sudderth,et al.  Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[137]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[138]  Mingyu Li,et al.  Accurate Object Pose Estimation Using Depth Only , 2018, Sensors.

[139]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[140]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[141]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[142]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[143]  Reza Bosagh Zadeh,et al.  FusionNet: 3D Object Classification Using Multiple Data Representations , 2016, ArXiv.

[144]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[145]  Jean-Bernard Hayet,et al.  Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[146]  Manabu Hashimoto,et al.  Fast 6D Pose Estimation from a Monocular Image Using Hierarchical Pose Trees , 2016, ECCV.

[147]  Robert Xiao,et al.  MRTouch: Adding Touch Input to Head-Mounted Mixed Reality , 2018, IEEE Transactions on Visualization and Computer Graphics.

[148]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[149]  Eric Brachmann,et al.  Global Hypothesis Generation for 6D Object Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[150]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[151]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[152]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[153]  Bertram Drost,et al.  3D Object Detection and Localization Using Multimodal Point Pair Features , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[154]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[155]  Tae-Kyun Kim,et al.  Multi-view 6D Object Pose Estimation and Camera Motion Planning Using RGBD Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[156]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[157]  Noah Snavely,et al.  NYC3DCars: A Dataset of 3D Vehicles in Geographic Context , 2013, 2013 IEEE International Conference on Computer Vision.

[158]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[159]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[160]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[161]  Maya Cakmak,et al.  Code3: A System for End-to-End Programming of Mobile Manipulator Robots for Novices and Experts , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[162]  Markus Ulrich,et al.  Combining Scale-Space and Similarity-Based Aspect Graphs for Fast 3D Object Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[163]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[164]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[165]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[166]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[167]  Sinisa Todorovic,et al.  From contours to 3D object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[168]  Timothy Patten,et al.  Multi-Task Template Matching for Object Detection, Segmentation and Pose Estimation Using Depth Images , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[169]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[170]  Nico Blodow,et al.  Towards 3D Point cloud based object maps for household environments , 2008, Robotics Auton. Syst..

[171]  Hao Jiang Finding Approximate Convex Shapes in RGBD Images , 2014, ECCV.

[172]  Jun Li,et al.  Active Recognition and Manipulation for Mobile Robot Bin Picking , 2014, Technology Transfer Experiments from the ECHORD Project.

[173]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[174]  Vincent Lepetit,et al.  Hashmod: A Hashing Method for Scalable 3D Object Detection , 2016, BMVC.

[175]  Markus Schoeler,et al.  Semantic Pose Using Deep Networks Trained on Synthetic RGB-D , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[176]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[177]  David Lindlbauer,et al.  Changing the Appearance of Real-World Objects By Modifying Their Surroundings , 2017, CHI.

[178]  Vincent Lepetit,et al.  Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[179]  P. Abbeel,et al.  Yale-CMU-Berkeley dataset for robotic manipulation research , 2017, Int. J. Robotics Res..

[180]  Nico Blodow,et al.  Towards 3D object maps for autonomous household robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[181]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[182]  Ming-Yu Liu,et al.  Learning to Rank 3D Features , 2014, ECCV.

[183]  Carsten Rother,et al.  Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images , 2012, ECCV.

[184]  Bernard Ghanem,et al.  2D-Driven 3D Object Detection in RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[185]  Fei-Fei Li,et al.  Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going? , 2013, 2013 IEEE International Conference on Computer Vision.

[186]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[187]  Horst Bischof,et al.  Semi-Supervised Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[188]  Tae-Kyun Kim,et al.  Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios , 2018, BMVC.

[189]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[190]  Mohammed Bennamoun,et al.  RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests , 2017, IEEE Transactions on Robotics.

[191]  Kuan-Ting Yu,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2019, The International Journal of Robotics Research.

[192]  Pascal Fua,et al.  Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[193]  Ian D. Reid,et al.  Real-Time Monocular Object-Model Aware Sparse SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[194]  Steven L. Waslander,et al.  Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[195]  Darius Burschka,et al.  An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes , 2010, ACCV.

[196]  Slobodan Ilic,et al.  Point Pair Features Based Object Detection and Pose Estimation Revisited , 2015, 2015 International Conference on 3D Vision.

[197]  Roberto Cipolla,et al.  Robust Instance Recognition in Presence of Occlusion and Clutter , 2014, ECCV.

[198]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[199]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[200]  Da-Yuan Huang,et al.  PuPoP: Pop-up Prop on Palm for Virtual Reality , 2018, UIST.

[201]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[202]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[203]  Ingmar Posner,et al.  Voting for Voting in Online Point Cloud Object Detection , 2015, Robotics: Science and Systems.

[204]  Jun Li,et al.  Mobile bin picking with an anthropomorphic service robot , 2013, 2013 IEEE International Conference on Robotics and Automation.

[205]  Christopher Zach,et al.  A dynamic programming approach for fast and robust object pose recognition from range images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).