LieNet: Real-time Monocular Object Instance 6D Pose Estimation

In this work, we present, LieNet, a novel deep learning framework that simultaneously detects, segments multiple object instances, and estimates their 6D poses from a single RGB image without requiring additional post-processing. Our system is accurate and fast (∼10 fps), which is well suited for real-time applications. In particular, LieNet detects and segments object instances in the image analogous to modern instance segmentation networks such as Mask R-CNN, but contains a novel additional sub-network for 6D pose estimation. LieNet estimates the rotation matrix of an object by regressing a Lie algebra based rotation representation, and estimates the translation vector by predicting the distance of the object to the camera center. The experiments on two standard pose benchmarking datasets show that LieNet greatly outperforms other recent CNN based pose prediction methods when they are used with monocular images and without post-refinements.

[1]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[2]  C. Altafini The de casteljau algorithm on SE(3) , 2001 .

[3]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[4]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Eric Brachmann,et al.  PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Eric Brachmann,et al.  Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Manolis I. A. Lourakis,et al.  Detection and fine 3D pose estimation of texture-less objects in RGB-D images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Motilal Agrawal,et al.  A Lie Algebraic Approach for Consistent Pose Registration for General Euclidean Motion , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[13]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[15]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Yu Zhang,et al.  Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups , 2016, International Journal of Computer Vision.

[17]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[18]  Vincent Lepetit,et al.  Hashmod: A Hashing Method for Scalable 3D Object Detection , 2016, BMVC.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[21]  S. Altmann Rotations, Quaternions, and Double Groups , 1986 .

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Eric Brachmann,et al.  Global Hypothesis Generation for 6D Object Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Antonio Manuel López Peña,et al.  VSLAM pose initialization via Lie groups and Lie algebras optimization , 2013, 2013 IEEE International Conference on Robotics and Automation.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[27]  Pieter Abbeel,et al.  A textured object recognition pipeline for color and depth image data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Siddhartha S. Srinivasa,et al.  MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.