A Dynamic Keypoints Selection Network for 6DoF Pose Estimation

6 DoF poses estimation problem aims to estimate the rotation and translation parameters between two coordinates, such as object world coordinate and camera world coordinate. Although some advances are made with the help of deep learning, how to full use scene information is still a problem. Prior works tackle the problem by pixel-wise feature fusion but need to randomly selecte numerous points from images, which can not satisfy the demands of fast inference simultaneously and accurate pose estimation. In this work, we present a novel deep neural network based on dynamic keypoints selection designed for 6DoF pose estimation from a single RGBD image. Our network includes three parts, instance semantic segmentation, edge points detection and 6DoF pose estimation. Given an RGBD image, our network is trained to predict pixel category and the translation to edge points and center points. Then, a least-square fitting manner is applied to estimate the 6DoF pose parameters. Specifically, we propose a dynamic keypoints selection algorithm to choose keypoints from the foreground feature map. It allows us to leverage geometric and appearance information. During 6DoF pose estimation, we utilize the instance semantic segmentation result to filter out background points and only use foreground points to finish edge points detection and 6DoF pose estimation. Experiments on two commonly used 6DoF estimation benchmark datasets, YCB-Video and LineMoD, demonstrate that our method outperforms the state-ofthe-art methods and achieves significant improvements over other same category methods time efficiency.

[1]  Bo Yang,et al.  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[9]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Vincent Lepetit,et al.  Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[11]  Chyi-Yeu Lin,et al.  6D pose estimation using an improved method based on point pair features , 2018, 2018 4th International Conference on Control, Automation and Robotics (ICCAR).

[12]  Niloy J. Mitra,et al.  Super4PCS: Fast Global Pointcloud Registration via Smart Indexing , 2019 .

[13]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[14]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[15]  Paul S. Schenker,et al.  Sensor fusion IV: Control paradigms and data structures; Proceedings of the Meeting, Boston, MA, Nov. 12-15, 1991 , 1992 .

[16]  Dirk Kraft,et al.  Rotational Subgroup Voting and Pose Clustering for Robust 3D Object Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[20]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ian Reid,et al.  Reconstruct Locally, Localize Globally: A Model Free Method for Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[23]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Michel Dhome,et al.  Real time 3D template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  Zhiguo Jiang,et al.  Out-of-region keypoint localization for 6D pose estimation , 2020, Image Vis. Comput..

[26]  Haoqiang Fan,et al.  FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[28]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[31]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[34]  Jiri Matas,et al.  EPOS: Estimating 6D Pose of Objects With Symmetries , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[37]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[38]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[40]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[42]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Chen Qijun,et al.  A Novel Depth and Color Feature Fusion Framework for 6D Object Pose Estimation , 2021, IEEE Transactions on Multimedia.

[44]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Wei Sun,et al.  PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.