A robust SIFT-based descriptor for video classification

Voluminous amount of videos in today’s world has made the subject of objective (or semi-objective) classification of videos to be very popular. Among the various descriptors used for video classification, SIFT and LIFT can lead to highly accurate classifiers. But, SIFT descriptor does not consider video motion and LIFT is time-consuming. In this paper, a robust descriptor for semi-supervised classification based on video content is proposed. It holds the benefits of LIFT and SIFT descriptors and overcomes their shortcomings to some extent. For extracting this descriptor, the SIFT descriptor is first used and the motion of the extracted keypoints are then employed to improve the accuracy of the subsequent classification stage. As SIFT descriptor is scale invariant, the proposed method is also robust toward zooming. Also, using the global motion of keypoints in videos helps to neglect the local motions caused during video capturing by the cameraman. In comparison to other works that consider the motion and mobility of videos, the proposed descriptor requires less computations. Obtained results on the TRECVIT 2006 dataset show that the proposed method achieves more accurate results in comparison with SIFT in content-based video classifications by about 15 percent.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Nicu Sebe,et al.  Object Recognition for Video Retrieval , 2002, CIVR.

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[5]  Riccardo Leonardi,et al.  Analysis, Retrieval and Delivery of Multimedia Content , 2012 .

[6]  Mubarak Shah,et al.  Content based video matching using spatiotemporal volumes , 2008, Comput. Vis. Image Underst..

[7]  HongJiang Zhang,et al.  Motion texture: a new motion based video representation , 2002, Object recognition supported by user interaction for service robots.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Rong Yan,et al.  A review of text and image retrieval approaches for broadcast news video , 2007, Information Retrieval.

[10]  Patrick Pérez,et al.  Nonparametric motion characterization using causal probabilistic models for video indexing and retrieval , 2002, IEEE Trans. Image Process..

[11]  Yiannis Kompatsiaris,et al.  Local Invariant Feature Tracks for high-level video feature extraction , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[12]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[13]  Minh-Son Dao,et al.  Video retrieval using video object-trajectory and edge potential function , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[14]  Xiao-Ping Zhang,et al.  An Automated Video Object Extraction System Based on Spatiotemporal Independent Component Analysis and Multiscale Segmentation , 2006, EURASIP J. Adv. Signal Process..

[15]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).