论文信息 - Object 6D pose estimation with non-local attention

Object 6D pose estimation with non-local attention

In this paper, we address the challenging task of estimating 6D object poses from a single RGB image. Motivated by the deep learning-based object detection methods, we propose a concise and efficient network that integrates 6D object pose parameter estimation into the object detection framework. Furthermore, for more robust estimation to occlusion, a nonlocal self-attention module is introduced. The experimental results show that the proposed method reaches the state-ofthe-art performance on the YCB-video and the Linemod datasets.

[1] Xudong Jiang,et al. Bi-Directional Dermoscopic Feature Learning and Multi-Scale Consistent Decision Fusion for Skin Lesion Segmentation , 2019, IEEE Transactions on Image Processing.

[2] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[3] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[5] J. Dai. An historical review of the theoretical development of rigid body displacements from Rodrigues parameters to the finite twist , 2006 .

[6] Andrew Zisserman,et al. Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[7] Alexei A. Efros,et al. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Sven J. Dickinson,et al. 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[9] Dieter Fox,et al. Learning hierarchical sparse features for RGB-(D) object recognition , 2014, Int. J. Robotics Res..

[10] Eric Brachmann,et al. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Eric Brachmann,et al. Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[13] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Xudong Jiang,et al. DeepDeblur: text image recovery from blur to sharp , 2019, Multimedia Tools and Applications.

[15] Gang Wang,et al. Feature Boosting Network For 3D Pose Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Jean-Michel Morel,et al. A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Andrew Zisserman,et al. Multiple view geometry in computer visiond , 2001 .

[18] James M. Rehg,et al. 3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Siddhartha S. Srinivasa,et al. The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[20] Silvio Savarese,et al. Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Xudong Jiang,et al. Dermoscopic Image Segmentation Through the Enhanced High-Level Parsing and Class Weighted Loss , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[22] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23] Cordelia Schmid,et al. 3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[24] Jana Kosecka,et al. Fast Single Shot Detection and Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[25] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Gang Wang,et al. Boundary-Aware Feature Propagation for Scene Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Xudong Jiang,et al. Semantic Segmentation With Context Encoding and Multi-Path Decoding , 2020, IEEE Transactions on Image Processing.

[28] Xudong Jiang,et al. Semantic Correlation Promoted Shape-Variant Context for Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] David A. McAllester,et al. Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30] Gang Wang,et al. Toward Achieving Robust Low-Level and High-Level Scene Parsing , 2019, IEEE Transactions on Image Processing.

[31] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[33] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[34] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Jana Kosecka,et al. 3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Konrad Schindler,et al. Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Vincent Lepetit,et al. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] Gang Wang,et al. Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Nassir Navab,et al. Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[40] Sven Behnke,et al. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41] Bernt Schiele,et al. Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.