Video object matching based on SIFT algorithm

SIFT (scale invariant feature transform) is used to solve visual tracking problem, where the appearances of the tracked object and scene background change during tracking. The implementation of this algorithm has five major stages: scale-space extrema detection; keypoint localization; orientation assignment; keypoint descriptor; keypoint matching. From the beginning frame, object is selected as the template, its SIFT features are computed. Then in the following frames, the SIFT features are computed. Euclidean distance between the object's SIFT features and the frames' SIFT features can be used to compute the accurate position of the matched object. The experimental results on real video sequences demonstrate the effectiveness of this approach and show this algorithm is of higher robustness and real-time performance. It can solve the matching problem with translation, rotation and affine distortion between images. It plays an important role in video object tracking and video object retrieval.

[1]  William Rucklidge,et al.  Efficiently Locating Objects Using the Hausdorff Distance , 1997, International Journal of Computer Vision.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Rohini K. Srihari,et al.  Spatial color histograms for content-based image retrieval , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[5]  Horst Bischof,et al.  Fast Approximated SIFT , 2006, ACCV.

[6]  Matthew A. Brown,et al.  Invariant Features from Interest Point Groups , 2002, BMVC.

[7]  Miroslaw Pawlak,et al.  On Image Analysis by Moments , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Luc Van Gool,et al.  HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance , 2003, CIVR.