Spatio-temporal context kernel for activity recognition

Local space-time features and bag-of-feature (BOF) representation are often used for action recognition in previous approaches. For complicated human activities, however, the limitation of these approaches blows up because of the local properties of features and the lack of context. This paper addresses the problem by exploiting the spatio-temporal context information between features. We first define a spatio-temporal context, which combines the scale invariant spatio-temporal neighberhood of local features with the spatio-temporal relationships between them. Then, we introduce a spatio-temporal context kernel (STCK), which not only takes into account the local properties of features but also considers their spatial and temporal context information. STCK has a promising generalization property and can be plugged into SVMs for activities recognition. The experimental results on challenging activity datasets show that, compared to context-free model, the spatio-temporal context kernel improves the recognition performance.

[1]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[2]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[3]  Rama Chellappa,et al.  Locally time-invariant models of human activities using trajectories on the grassmannian , 2009, CVPR.

[4]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[5]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  P. Siva,et al.  Action Detection in Crowd , 2010, BMVC.

[7]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[9]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[11]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Junsong Yuan,et al.  Middle-Level Representation for Human Activities Recognition: The Role of Spatio-Temporal Relationships , 2010, ECCV Workshops.

[14]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[16]  Hichem Sahbi,et al.  Context-dependent kernel design for object matching and recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.