Three-dimensional object tracking based on perspective scale invariant feature transform correspondences

Abstract. Reconstructing three-dimensional (3-D) poses from matched feature correspondences is widely used in 3-D object tracking. The precision of correspondence matching plays a major role in the pose reconstruction. Without prior knowledge of the perspective camera model, state-of-the-art methods only deal with two-dimensional (2-D) planar affine transforms. An interest point’s detector and descriptor [perspective scale invariant feature transform (SIFT)] is proposed to overcome the side effects of viewpoint changing, i.e., our detector is invariant to viewpoint changing. Perspective SIFT is detected by the SIFT approach, where the sample region is determined by projecting the original sample region to the image plane based on the established camera model. An iterative algorithm then modifies the pose of the tracked object and it generally converges to a 3-D perspective invariant point. The pose of the tracked object is finally estimated by the combination of template warping and perspective SIFT correspondences. Thorough evaluations are performed on two public databases, the Biwi Head Pose dataset and the Boston University dataset. Comparisons illustrate that the proposed keypoint’s detector largely improves the tracking performance.

[1]  Hai Xuan Pham,et al.  Robust real-time performance-driven 3D face tracking , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[3]  Jeremiah Neubert,et al.  Rapidly constructed appearance models for tracking in augmented reality applications , 2011, Machine Vision and Applications.

[4]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[6]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[7]  Philip David,et al.  SoftPOSIT: Simultaneous Pose and Correspondence Determination , 2002, ECCV.

[8]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[9]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Roberto Opromolla,et al.  Pose Estimation for Spacecraft Relative Navigation Using Model-Based Algorithms , 2017, IEEE Transactions on Aerospace and Electronic Systems.

[11]  Jing Xiao,et al.  Robust full‐motion recovery of head by dynamic templates and re‐registration techniques , 2003 .

[12]  Beiji Zou,et al.  A novel particle filter with implicit dynamic model for irregular motion tracking , 2012, Machine Vision and Applications.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Bodo Rosenhahn,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[15]  Qi Zhang,et al.  SIFT implementation and optimization for multi-core systems , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Javier Díaz,et al.  Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[18]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Peter Willett,et al.  Particle filter tracking for banana and contact lens problems , 2015, IEEE Transactions on Aerospace and Electronic Systems.

[20]  Vincent Lepetit,et al.  Combining edge and texture information for real-time accurate 3D camera tracking , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[21]  Patrick Bouthemy,et al.  A 2D-3D model-based approach to real-time visual tracking , 2001, Image Vis. Comput..

[22]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[23]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  S.S. Blackman,et al.  Multiple hypothesis tracking for multiple target tracking , 2004, IEEE Aerospace and Electronic Systems Magazine.

[25]  Hassan Foroosh,et al.  3D Pose Tracking With Multitemplate Warping and SIFT Correspondences , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Javier R. Movellan,et al.  Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[27]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Yakup Genc,et al.  GPU-based Video Feature Tracking And Matching , 2006 .