Efficient Spatio-Temporal Edge Descriptor

Concept-based video retrieval is a developing area of current multimedia content analysis research. The use of spatio-temporal descriptors in content-based video retrieval has always seemed like a promising way to bridge the semantic gap problem in ways that typical visual retrieval methods cannot. In this paper we propose a spatio-temporal descriptor called ST-MP7EH which can address some of the challenges encountered in practical systems and we present our experimental results in support of our participation at TRECVid 2011 Semantic Indexing. This descriptor combines the MPEG-7 Edge Histogram descriptor with motion information and is designed to be computationally efficient, scalable and highly parallel. We show that our descriptor performs well in SVM classification compared to a baseline spatio-temporal descriptor, which is inspired by some of the state-of-the-art systems that make the top lists of TRECVid. We highlight the importance of the temporal component by comparing to the initial edge histogram descriptor and the potential of feature fusion with other classifiers.

[1]  Timo Ojala,et al.  TRECVID 2005 Experiments at Media Team Oulu , 2005, TRECVID.

[2]  Jamshid Shanbehzadeh,et al.  Image retrieval based on shape similarity by edge orientation autocorrelogram , 2003, Pattern Recognit..

[3]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[4]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Randal C. Nelson,et al.  Recognition of motion from temporal texture , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[7]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[8]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Ming Yang,et al.  Detecting Human Actions in Surveillance Videos , 2009, TRECVID.

[10]  Fang Liu,et al.  Finding periodicity in space and time , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Keiji Yanai,et al.  UEC at TRECVID 2010 Semantic Indexing Task , 2010, TRECVID.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Maneesh Kumar Singh,et al.  State-of-the-art on spatio-temporal information-based video retrieval , 2009, Pattern Recognit..

[14]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[15]  Chee Sun Won,et al.  Efficient use of local edge histogram descriptor , 2000, MULTIMEDIA '00.

[16]  Keiichiro Hoashi,et al.  High-Level Feature Extraction Experiments for TRECVID 2007 , 2007, TRECVID.

[17]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[18]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[20]  Mohan S. Kankanhalli,et al.  Proceedings of the 2008 international conference on Content-based image and video retrieval , 2008 .