On the use of feature tracks for dynamic concept detection in video

This paper proposes the use of feature tracks for the detection of concepts in video, particularly dynamic concepts. Feature tracks are defined as sets of local interest points found in different frames of a video shot that exhibit spatio-temporal and visual continuity, defining a trajectory in the 2D+Time space. The extraction of feature tracks and the selection and representation of an appropriate subset of them allow the generation of a Bag-of-Spatiotemporal-Words model for the shot, which facilitates capturing the dynamics of video content. The experimental evaluation of the proposed approach highlights how the selection of such feature tracks for the definition of the Bag-of-Spatiotemporal-Words model enhances the results of traditional keyframe-based concept detection techniques.

[1]  Hironobu Fujiyoshi,et al.  A Method for Visualizing Pedestrian Traffic Flow Using SIFT Feature Point Tracking , 2009, PSIVT.

[2]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[3]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Michael G. Strintzis,et al.  Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Cedric Nishan Canagarajah,et al.  A Unified Framework for Object Retrieval and Mining , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[9]  Huiyu Zhou,et al.  Object tracking using SIFT features and mean shift , 2009, Comput. Vis. Image Underst..

[10]  Stéphane Marchand-Maillet,et al.  Local Feature Trajectories for Efficient Event-Based Indexing of Video Sequences , 2006, CIVR.

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[13]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[14]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[15]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[16]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Nicolai Petkov,et al.  2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING , 2010, ICIP 2010.

[19]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.