Harris-SIFT Descriptor for Video Event Detection Based on a Machine Learning Approach

Video data is becoming increasingly important in many commercial and scientific areas with the advent of applications such as digital broadcasting, video-conferencing and multimedia processing tools, and with the development of the hardware and communications infrastructure necessary to support visual applications. The objective of this work is to propose a method for event detection in a video stream. We combine Harris-SIFT descriptor with motion information in order to detect human actions in video. We tested our method in KTH database and compared it to space-time interest points (STIP) descriptor. The results obtained achieved similar results to the STIP method.

[1]  James Ze Wang,et al.  Wavelets and Imaging Informatics: A Review of the Literature , 2001, J. Biomed. Informatics.

[2]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Chong-Wah Ngo,et al.  Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Randal C. Nelson,et al.  Recognition of motion from temporal texture , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Min Chen,et al.  Semantic event detection via multimodal data mining , 2006, IEEE Signal Processing Magazine.

[11]  Marcus Jerome Pickering,et al.  Video Retrieval by Feature Learning in Key Frames , 2002, CIVR.

[12]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Min Chen,et al.  A decision tree-based multimodal data mining framework for soccer goal detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[14]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[19]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[20]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[21]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[22]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[23]  G. Camara-Chavez,et al.  An interactive video content-based retrieval system , 2008, 2008 15th International Conference on Systems, Signals and Image Processing.

[24]  Zhe-Ming Lu,et al.  3D Model Retrieval Based on Vector Quantisation Index Histograms , 2006 .

[25]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jean-Marc Odobez,et al.  A Thousand Words in a Scene , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.