Trajectories based descriptor for dynamic events annotation

Video concepts annotation is a challenging problem having strong implication in many applications such as videos search and retrieval. In this work, we focus on the recognition of dynamics events (running, walking) in unconstrained videos. Feature trajectories have been shown to be an efficient video representation [23, 16, 17]. Trajectories are extracted by tracking interest points over several video frames to capture both motion and appearance information. We take advantage of this representation to characterize dynamic concepts in our events recognition system. At the trajectories extraction step, we investigate a new trajectory filtering scheme retaining only trajectories having a significant motions, helping the dynamic events recognition. We also propose two new trajectory-based descriptors. The first descriptor captures the trajectories motion through their first order statistics. The second descriptor studies the trajectories motion derivative to be invariant to uniform camera motion, such as a translation in a traveling scene. We evaluate our proposals on the HOHA dataset, a challenging dataset composed of videos extracted from Hollywood with significant camera motion.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[4]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[5]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  C. Schmid,et al.  Description of Interest Regions with Center-Symmetric Local Binary Patterns , 2006, ICVGIP.

[10]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Patrick Bouthemy,et al.  A Statistical Video Content Recognition Method Using Invariant Features on Object Trajectories , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Chong-Wah Ngo,et al.  Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[14]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Bingbing Ni,et al.  Contextualizing histogram , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[18]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[19]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Loong Fah Cheong,et al.  Activity recognition using dense long-duration trajectories , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[21]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[22]  Yiannis Kompatsiaris,et al.  Local Invariant Feature Tracks for high-level video feature extraction , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[23]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[24]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .