Deep Model-Based 6D Pose Refinement in RGB

We present a novel approach for model-based 6D pose refinement in color data. Building on the established idea of contour-based pose tracking, we teach a deep neural network to predict a translational and rotational update. At the core, we propose a new visual loss that drives the pose update by aligning object contours, thus avoiding the definition of any explicit appearance model. In contrast to previous work our method is correspondence-free, segmentation-free, can handle occlusion and is agnostic to geometrical symmetry as well as visual ambiguities. Additionally, we observe a strong robustness towards rough initialization. The approach can run in real-time and produces pose accuracies that come close to 3D ICP without the need for depth data. Furthermore, our networks are trained from purely synthetic data and will be published together with the refinement code at http://campar.in.tum.de/Main/FabianManhardt to ensure reproducibility.

[1]  Bodo Rosenhahn,et al.  Region-Based Pose Tracking , 2007, IbPRIA.

[2]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Rami R. Hagege,et al.  2D-3D Pose Estimation of Heterogeneous Objects Using a Region Based Approach , 2015, International Journal of Computer Vision.

[4]  Shinji Uchiyama Model-Based 3D Object Tracking with Online Texture Update , 2009, MVA.

[5]  Nassir Navab,et al.  A Versatile Learning-Based 3D Temporal Tracker: Scalable, Robust, Online , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[7]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Marios Savvides,et al.  Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[11]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bodo Rosenhahn,et al.  Region-based pose tracking with occlusions using 3D models , 2010, Machine Vision and Applications.

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Vincent Lepetit,et al.  On Pre-Trained Image Features and Synthetic Images for Deep Learning , 2017, ECCV Workshops.

[15]  Jong-Il Park,et al.  Optimal Local Searching for Fast and Robust Textureless 3D Object Tracking in Highly Cluttered Backgrounds , 2014, IEEE Transactions on Visualization and Computer Graphics.

[16]  Bodo Rosenhahn,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[17]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Anthony J. Yezzi,et al.  A Geometric Approach to Joint 2D Region-Based Segmentation and 3D Pose Estimation Using a 3D Shape Prior , 2010, SIAM J. Imaging Sci..

[19]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[20]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[21]  Richard L. Holloway,et al.  Registration Error Analysis for Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[22]  Eric Brachmann,et al.  6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[23]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[24]  Henrik I. Christensen,et al.  RGB-D object tracking: A particle filter approach on GPU , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Hans-Peter Seidel,et al.  A Comparison of Shape Matching Methods for Contour Based Pose Estimation , 2006, IWCIA.

[26]  Jean-François Lalonde,et al.  Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[27]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ian D. Reid,et al.  PWP3D: Real-time Segmentation and Tracking of 3D Objects , 2009, BMVC.

[30]  Vincent Lepetit,et al.  Multiple 3D Object tracking for augmented reality , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[31]  Javier Díaz,et al.  Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Stepán Obdrzálek,et al.  On Evaluation of 6D Object Pose Estimation , 2016, ECCV Workshops.

[33]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ulrich Schwanecke,et al.  Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects , 2016, ECCV.

[35]  Olaf Kähler,et al.  Real-Time 3D Tracking and Reconstruction on Mobile Phones , 2015, IEEE Transactions on Visualization and Computer Graphics.

[36]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Nassir Navab,et al.  Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[39]  Roberto Cipolla,et al.  Real-Time Visual Tracking of Complex Structures , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[41]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[43]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).