Object 6D pose estimation with non-local attention

In this paper, we address the challenging task of estimating 6D object poses from a single RGB image. Motivated by the deep learning-based object detection methods, we propose a concise and efficient network that integrates 6D object pose parameter estimation into the object detection framework. Furthermore, for more robust estimation to occlusion, a nonlocal self-attention module is introduced. The experimental results show that the proposed method reaches the state-ofthe-art performance on the YCB-video and the Linemod datasets.

[1]  Xudong Jiang,et al.  Bi-Directional Dermoscopic Feature Learning and Multi-Scale Consistent Decision Fusion for Skin Lesion Segmentation , 2019, IEEE Transactions on Image Processing.

[2]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[3]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[5]  J. Dai An historical review of the theoretical development of rigid body displacements from Rodrigues parameters to the finite twist , 2006 .

[6]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[7]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[9]  Dieter Fox,et al.  Learning hierarchical sparse features for RGB-(D) object recognition , 2014, Int. J. Robotics Res..

[10]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xudong Jiang,et al.  DeepDeblur: text image recovery from blur to sharp , 2019, Multimedia Tools and Applications.

[15]  Gang Wang,et al.  Feature Boosting Network For 3D Pose Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[18]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[20]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xudong Jiang,et al.  Dermoscopic Image Segmentation Through the Enhanced High-Level Parsing and Class Weighted Loss , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[24]  Jana Kosecka,et al.  Fast Single Shot Detection and Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[25]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Gang Wang,et al.  Boundary-Aware Feature Propagation for Scene Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Xudong Jiang,et al.  Semantic Segmentation With Context Encoding and Multi-Path Decoding , 2020, IEEE Transactions on Image Processing.

[28]  Xudong Jiang,et al.  Semantic Correlation Promoted Shape-Variant Context for Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Gang Wang,et al.  Toward Achieving Robust Low-Level and High-Level Scene Parsing , 2019, IEEE Transactions on Image Processing.

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[33]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[34]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Konrad Schindler,et al.  Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[40]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.