论文信息 - Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, there is lack of survey study about latest development of deep learning based methods. Therefore, this paper presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this paper is limited to methods taking monocular RGB/RGBD data as input, covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods about both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

[1] Vincent Lepetit,et al. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3] Tingbo Hou,et al. Instant Motion Tracking and Its Applications to Augmented Reality , 2019, ArXiv.

[4] Ming Liu,et al. Ground-Aware Monocular 3D Object Detection for Autonomous Driving , 2021, IEEE Robotics and Automation Letters.

[5] Xiangyang Ji,et al. CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning , 2020 .

[6] Tingbo Hou,et al. Instant 3D Object Tracking with Applications in Augmented Reality , 2020, ArXiv.

[7] Tae-Kyun Kim,et al. Recovering 6D Object Pose: A Review and Multi-modal Analysis , 2017, ECCV Workshops.

[8] Silvio Savarese,et al. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jie Song,et al. Category Level Object Pose Estimation via Neural Analysis-by-Synthesis , 2020, ECCV.

[10] João L. Monteiro,et al. Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy , 2021, Inf. Fusion.

[11] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Federico Tombari,et al. Self6D: Self-Supervised Monocular 6D Object Pose Estimation , 2020, ECCV.

[13] Simone Frintrop,et al. CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[14] Yan Wang,et al. Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving , 2019, ICLR.

[15] Steven L. Waslander,et al. Categorical Depth Distribution Network for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Takeo Kanade,et al. 6D pose estimation of textureless shiny objects using random ferns for bin-picking , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Bogdan Kwolek,et al. 3D Model-based 6D Object Pose Tracking on RGB Images using Particle Filtering and Heuristic Optimization , 2020, VISIGRAPP.

[18] Jaakko Lehtinen,et al. Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[19] Sorin Grigorescu,et al. A Survey of Deep Learning Techniques for Autonomous Driving , 2020, J. Field Robotics.

[20] G. Riva,et al. The Past, Present, and Future of Virtual and Augmented Reality Research: A Network and Cluster Analysis of the Literature , 2018, Front. Psychol..

[21] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Jinming Duan,et al. PointPoseNet: Point Pose Network for Robust 6D Object Pose Estimation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23] Jon Peddie,et al. Augmented Reality: Where We Will All Live , 2017 .

[24] Wanli Ouyang,et al. Rethinking Pseudo-LiDAR Representation , 2020, ECCV.

[25] Peixuan Li,et al. Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training , 2020, ArXiv.

[26] Pascal Fua,et al. Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Ales Leonardis,et al. FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Timothy Patten,et al. Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[30] Matthias Grundmann,et al. Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.

[32] Nei Kato,et al. Networking and Communications in Autonomous Driving: A Survey , 2019, IEEE Communications Surveys & Tutorials.

[33] Tae-Kyun Kim,et al. Distance-Normalized Unified Representation for Monocular 3D Object Detection , 2020, ECCV.

[34] Hyun Jun Jung,et al. I Like to Move It: 6D Pose Estimation as an Action Decision Process , 2020, ArXiv.

[35] V. Lepetit,et al. EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[36] Yan Lu,et al. MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization , 2018, AAAI.

[37] Takeo Kanade,et al. How Useful Is Photo-Realistic Rendering for Visual Learning? , 2016, ECCV Workshops.

[38] Haojie Li,et al. Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Guillermo Garcia-Hernando,et al. A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators , 2020, Image Vis. Comput..

[40] Kris M. Kitani,et al. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling , 2020, ArXiv.

[41] Liang Du,et al. Monocular 3D Object Detection via Feature Domain Adaptation , 2020, European Conference on Computer Vision.

[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Hermann Winner,et al. Autonomous Driving: Technical, Legal and Social Aspects , 2016 .

[44] Manolis I. A. Lourakis,et al. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45] Vincent Lepetit,et al. Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[46] Bernt Schiele,et al. Kinematic 3D Object Detection in Monocular Video , 2020, ECCV.

[47] Jana Kosecka,et al. 3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Jean-François Lalonde,et al. Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[49] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Carsten Steger,et al. Similarity Measures for Occlusion, Clutter, and Illumination Invariant Object Recognition , 2001, DAGM-Symposium.

[51] Jiwen Lu,et al. Deep Fitting Degree Scoring Network for Monocular 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Siddhartha S. Srinivasa,et al. The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[53] Jiaru Song,et al. HybridPose: 6D Object Pose Estimation Under Hybrid Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Xiaogang Wang,et al. GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Shiguo Lian,et al. Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review , 2019, ArXiv.

[56] Nassir Navab,et al. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57] Adrien Gaidon,et al. ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Sebastian Thrun,et al. Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[59] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[60] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[61] Huijun Gao,et al. Monocular 3D Object Detection With Sequential Feature Association and Depth Hint Augmentation , 2020, IEEE Transactions on Intelligent Vehicles.

[62] Hujun Bao,et al. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64] S. Umeyama,et al. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[65] Qi Tian,et al. CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66] Lourdes Agapito,et al. Detect Globally, Label Locally: Learning Accurate 6-DOF Object Pose Estimation by Joint Segmentation and Coordinate Regression , 2018, IEEE Robotics and Automation Letters.

[67] Pankaj Rabha,et al. A Survey on Joint Object Detection and Pose Estimation using Monocular Vision , 2018, MATEC Web of Conferences.

[68] Dieter Fox,et al. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[69] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[70] Tingbo Hou,et al. MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision , 2020, ArXiv.

[71] Ali Etemad,et al. Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting , 2021, European Conference on Computer Vision.

[72] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[73] Stepán Obdrzálek,et al. On Evaluation of 6D Object Pose Estimation , 2016, ECCV Workshops.

[74] Marcus Vetter,et al. EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach , 2020, ArXiv.

[75] Zoltan-Csaba Marton,et al. Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[76] Vladlen Koltun,et al. Tracking Objects as Points , 2020, ECCV.

[77] Yi Liu,et al. Survey on 6D Pose Estimation of Rigid Object , 2020, 2020 39th Chinese Control Conference (CCC).

[78] Nassir Navab,et al. Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[79] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[80] Xiaolin Hu,et al. 6D Object Pose Regression via Supervised Learning on Point Clouds , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[81] Kai Xu,et al. Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82] Yi Li,et al. DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[83] Zizhang Wu,et al. SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[84] Sergey Levine,et al. Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85] Nassir Navab,et al. Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[86] Xin Yu,et al. DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[87] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[88] Xiangyang Ji,et al. CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[89] Kui Jia,et al. DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[90] Jürgen Schmidhuber,et al. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[91] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[92] Haoqiang Fan,et al. FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93] Andrew J. Davison,et al. MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[94] Shunli Zhang,et al. Seeing Through the Occluders: Robust Monocular 6-DOF Object Pose Tracking via Model-Guided Video Object Segmentation , 2020, IEEE Robotics and Automation Letters.

[95] Carlos Delgado Kloos,et al. Augmented reality for STEM learning: A systematic review , 2018, Comput. Educ..

[96] Tony X. Han,et al. Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[97] Eric Brachmann,et al. Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[98] Yan Wang,et al. Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[99] Tae-Kyun Kim,et al. Introducing Pose Consistency and Warp-Alignment for Self-Supervised 6D Object Pose Estimation in Color Images , 2020, 2020 International Conference on 3D Vision (3DV).

[100] Kostas E. Bekris,et al. se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[101] Fengbo Ren,et al. MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time , 2020, ICML.

[102] Timothy Bretl,et al. PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[103] Zhiwu Lu,et al. Learning Depth-Guided Convolutions for Monocular 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[104] Sanja Fidler,et al. Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105] Christopher Zach,et al. Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss , 2019, ArXiv.

[106] Silvio Savarese,et al. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[107] Andrea Simonelli,et al. Demystifying Pseudo-LiDAR for Monocular 3D Object Detection , 2020, ArXiv.

[108] Dieter Fox,et al. Motion-Nets: 6D Tracking of Unknown Objects in Unseen Environments using RGB , 2019, ArXiv.

[109] Xin Yu,et al. 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss , 2020, ArXiv.

[110] Xiaoming Liu,et al. M3D-RPN: Monocular 3D Region Proposal Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[111] Andrea Simonelli,et al. Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[112] Kris Kitani,et al. Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[113] Slobodan Ilic,et al. HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[114] Slobodan Ilic,et al. 3D object instance recognition and pose estimation using triplet loss with dynamic margin , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[115] Jianren Wang,et al. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[116] Huaici Zhao,et al. RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving , 2020, ECCV.

[117] Peter Corke,et al. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[118] Bo Chen,et al. End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[119] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[120] A. Lynn Abbott,et al. Category-Level Articulated Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[121] Homayoun Najjaran,et al. Detecting 6D Poses of Target Objects From Cluttered Scenes by Learning to Align the Point Cloud Patches With the CAD Models , 2020, IEEE Access.

[122] Maren Bennewitz,et al. Humanoid robot localization in complex indoor environments , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[123] Eric Brachmann,et al. 6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[124] Wei Sun,et al. PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[125] Marios Savvides,et al. Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[126] Tae-Kyun Kim,et al. Pose Guided RGBD Feature Learning for 3D Object Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[127] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[128] Pascal Fua,et al. Single-Stage 6D Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[129] Eric Brachmann,et al. Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[130] Cewu Lu,et al. Estimating 6D Pose From Localizing Designated Surface Keypoints , 2018, ArXiv.

[131] Ruigang Yang,et al. The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[132] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[133] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[134] Daniel P. Huttenlocher,et al. Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[135] Mohammed Bennamoun,et al. Deep Learning for 3D Point Clouds: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136] Jiwen Lu,et al. Reinforced Axial Refinement Network for Monocular 3D Object Detection , 2020, ECCV.

[137] J. Ross Beveridge,et al. A Pose Proposal and Refinement Network for Better 6D Object Pose Estimation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[138] P. Maragos,et al. How to track your dragon: A Multi-Attentional Framework for real-time RGB-D 6-DOF Object Pose Tracking , 2020, ECCV Workshops.

[139] Gim Hee Lee,et al. Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation , 2020, ECCV.

[140] Eric Brachmann,et al. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[141] Jonathan T. Barron,et al. iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[142] Senbo Yan,et al. OCM3D: Object-Centric Monocular 3D Object Detection , 2021, ArXiv.

[143] Martin Jägersand,et al. Convolutional gated recurrent networks for video segmentation , 2016, 2017 IEEE International Conference on Image Processing (ICIP).

[144] Wei Chen,et al. G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[145] Quoc V. Le,et al. EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[146] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[147] Omar Y. Al-Jarrah,et al. A Survey on 3D Object Detection Methods for Autonomous Driving Applications , 2019, IEEE Transactions on Intelligent Transportation Systems.

[148] Vincent Lepetit,et al. Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[149] Xiangyang Ji,et al. Robust RGB-based 6-DoF Pose Estimation without Real Pose Annotations , 2020, ArXiv.

[150] Marco F. Huber,et al. A Survey on Learning-Based Robotic Grasping , 2020, Current Robotics Reports.

[151] Vincent Lepetit,et al. Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[152] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[153] Trevor Darrell,et al. Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[154] Kostas Daniilidis,et al. Learning SO(3) Equivariant Representations with Spherical CNNs , 2017, International Journal of Computer Vision.

[155] Yan Wang,et al. End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[156] Senbo Yan,et al. Lidar Point Cloud Guided Monocular 3D Object Detection , 2021, ArXiv.

[157] Timothy Patten,et al. Multi-Task Template Matching for Object Detection, Segmentation and Pose Estimation Using Depth Images , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[158] Vincent Lepetit,et al. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[159] Timothy Bretl,et al. Self-supervised 6D Object Pose Estimation for Robot Manipulation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[160] Jiri Matas,et al. EPOS: Estimating 6D Pose of Objects With Symmetries , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[161] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.

[162] Monica Bordegoni,et al. Towards augmented reality manuals for industry 4.0: A methodology , 2019, Robotics and Computer-Integrated Manufacturing.

[163] Eric Brachmann,et al. iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects , 2017, ACCV.

[164] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[165] Tae-Kyun Kim,et al. Category-level 6D Object Pose Recovery in Depth Images , 2018, ECCV Workshops.

[166] Leonidas J. Guibas,et al. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[167] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[168] Xiaowei Zhou,et al. 6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[169] Slobodan Ilic,et al. DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).