A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images

We present a method that estimates in real-time and under challenging conditions the 3D pose of a known object. Our method relies only on grayscale images since depth cameras fail on metallic objects, it can handle poorly textured objects, and cluttered, changing environments, the pose it predicts degrades gracefully in presence of large occlusions. As a result, by contrast with the state-of-the-art, our method is suitable for practical Augmented Reality applications even in industrial environments. To be robust to occlusions, we first learn to detect some parts of the target object. Our key idea is to then predict the 3D pose of each part in the form of the 2D projections of a few control points. The advantages of this representation is three-fold: We can predict the 3D pose of the object even when only one part is visible, when several parts are visible, we can combine them easily to compute a better pose of the object, the 3D pose we obtain is usually very accurate, even when only few parts are visible.

[1]  Kun He,et al.  Parameterizing Object Detectors in the Continuous Pose Space , 2014, ECCV.

[2]  Sinisa Todorovic,et al.  From contours to 3D object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[3]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[4]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[5]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Dima Damen,et al.  Real-time Learning and Detection of 3D Texture-less Objects: A Scalable Approach , 2012, BMVC.

[7]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[8]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Dieter Fox,et al.  A Scalable Tree-Based Approach for Joint Object and Pose Recognition , 2011, AAAI.

[10]  Vincent Lepetit,et al.  Pose Priors for Simultaneously Solving Alignment and Correspondence , 2008, ECCV.

[11]  Javier Díaz,et al.  Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David Joseph Tan,et al.  Multi-forest Tracker: A Chameleon in Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Antonio Torralba,et al.  FPM: Fine Pose Parts-Based Model with 3D CAD Models , 2014, ECCV.

[15]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Georgios Chliveros,et al.  Robust Multi-hypothesis 3D Object Pose Tracking , 2013, ICVS.

[17]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[18]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[19]  Ian D. Reid,et al.  Simultaneous Monocular 2D Segmentation, 3D Pose Recovery and 3D Reconstruction , 2012, ACCV.

[20]  Vincent Lepetit,et al.  Online learning of patch perspective rectification for efficient object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Ian Reid,et al.  Simultaneous Monocular 2 D Segmentation , 3 D Pose Recovery and 3 D Reconstruction , 2012 .

[23]  Henrik I. Christensen,et al.  RGB-D edge detection and edge-based registration , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Reinhard Koch,et al.  Perspectively Invariant Normal Features , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Chris Harris,et al.  RAPID - a video rate object tracker , 1990, BMVC.

[27]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[28]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[29]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Federico Tombari,et al.  BOLD Features to Detect Texture-less Objects , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Silvio Savarese,et al.  Monocular Multiview Object Tracking with 3D Aspect Parts , 2014, ECCV.

[33]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  G C Dean,et al.  An Introduction to Kalman Filters , 1986 .

[35]  Abhinav Gupta,et al.  Building Part-Based Object Detectors via 3D Geometry , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Francisco José Madrid-Cuevas,et al.  Automatic generation and detection of highly reliable fiducial markers under occlusion , 2014, Pattern Recognit..

[40]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[41]  Antonis A. Argyros,et al.  Scalable 3D Tracking of Multiple Interacting Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Y. Oshman,et al.  Averaging Quaternions , 2007 .

[43]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[44]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[45]  Ian D. Reid,et al.  PWP3D: Real-time Segmentation and Tracking of 3D Objects , 2009, BMVC.

[46]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[47]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[48]  David W. Murray,et al.  Full-3D Edge Tracking with a Particle Filter , 2006, BMVC.