Video Segmentation Descriptors for Event Recognition

This paper presents a new video motion descriptor based on a multi-scale video segmentation to provide a multi-layered output as well as connections with the rich interactions that occur between objects at the semantic level. We also put the emphasis on relationships between motion clusters by providing a new relative motion descriptor encapsulating relative motion patterns within a local spatio-temporal neighborhood. Experimental results on the challenging TRECVID MED11 event recognition dataset validate the approach.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Horst Bischof,et al.  Motion estimation with non-local total variation regularization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[4]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[5]  Mubarak Shah,et al.  Recognizing Complex Events Using Large Margin Joint Low-Level Event Model , 2012, ECCV.

[6]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[7]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[8]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[9]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[11]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[12]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[13]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[14]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[15]  Françoise J. Prêteux,et al.  Trajectory signature for action recognition in video , 2012, ACM Multimedia.

[16]  Prosenjit Bose,et al.  Global Context Descriptors for SURF and MSER Feature Descriptors , 2010, 2010 Canadian Conference on Computer and Robot Vision.

[17]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Martial Hebert,et al.  Representing Pairwise Spatial and Temporal Relations for Action Recognition , 2010, ECCV.

[19]  Chong-Wah Ngo,et al.  Trajectory-Based Modeling of Human Actions with Motion Reference Points , 2012, ECCV.

[20]  Sven J. Dickinson,et al.  Optimal Image and Video Closure by Superpixel Grouping , 2012, International Journal of Computer Vision.

[21]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Sergio A. Velastin,et al.  3D Extended Histogram of Oriented Gradients (3DHOG) for Classification of Road Users in Urban Scenes , 2009, BMVC.

[23]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Ramakant Nevatia,et al.  Video segmentation with spatio-temporal tubes , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[25]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[26]  Bodo Rosenhahn,et al.  Temporally Consistent Superpixels , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Linda G. Shapiro,et al.  A SIFT descriptor with global context , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[30]  Keiji Yanai,et al.  A SURF-Based Spatio-Temporal Feature for Feature-Fusion-Based Action Recognition , 2010, ECCV Workshops.

[31]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Lizhuang Ma,et al.  A new framework for feature descriptor based on SIFT , 2009, Pattern Recognit. Lett..