CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normal*: equal contributions, †: corresponding author Project page: https://yijiaweng.github.io/CAPTRA izes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCSREAL275 [29]) and articulated object pose benchmarks (SAPIEN [34], BMVC [18]) at the fastest FPS ⇠ 12.

[1]  Stefan Schaal,et al.  Probabilistic object tracking using a range camera , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Eric Brachmann,et al.  Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression , 2015, BMVC.

[3]  Vladlen Koltun,et al.  Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[4]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Silvio Savarese,et al.  6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[7]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Kostas E. Bekris,et al.  se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[11]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[12]  Shiyang Lu,et al.  Factored Pose Estimation of Articulated Objects using Efficient Nonparametric Belief Propagation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[13]  Chengde Wan,et al.  MEgATrack , 2020, ACM Trans. Graph..

[14]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Xiangyang Ji,et al.  CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning , 2020 .

[17]  Leonidas J. Guibas,et al.  Deep part induction from articulated object pairs , 2018, ACM Trans. Graph..

[18]  Timothy Bretl,et al.  PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[19]  Eric Brachmann,et al.  6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[20]  Christian Theobalt,et al.  GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[22]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[23]  Gim Hee Lee,et al.  Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation , 2020, ECCV.

[24]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Henrik I. Christensen,et al.  RGB-D object tracking: A particle filter approach on GPU , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[29]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[30]  Kai Xu,et al.  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[32]  A. Lynn Abbott,et al.  Category-Level Articulated Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Robert Y. Wang,et al.  Real-time hand-tracking with a color glove , 2009, ACM Trans. Graph..

[35]  Maher Moakher,et al.  To appear in: SIAM J. MATRIX ANAL. APPL. MEANS AND AVERAGING IN THE GROUP OF ROTATIONS∗ , 2002 .

[36]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Jie Song,et al.  Category Level Object Pose Estimation via Neural Analysis-by-Synthesis , 2020, ECCV.