HAck: A system for the recognition of human actions by kernels of visual strings

In this paper we propose HAcK, a novel method for recognizing Human Actions by string Kernel; the main idea is to represent each action through a sequence of visual characters, namely a string, able to model the temporal evolution of the events. Visual characters are extracted by analyzing global descriptors of the scene and by taking advantage on the depth information provided by a Kinect sensor. The similarity between actions is evaluated with a fast global alignment kernel, which allows to deal with actions of different length as well as with the noise introduced during the features extraction step. HAcK has been evaluated over two standard datasets and the obtained results, compared with state of the art approaches, confirm its effectiveness and its applicability in real environments.

[1]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[2]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[3]  Alessia Saggese,et al.  Recognizing Human Actions by a Bag of Visual Words , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[4]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[5]  Christian Bauckhage,et al.  Human activity recognition by separating style and content , 2014, Pattern Recognit. Lett..

[6]  Alessia Saggese,et al.  Recognition of Human Actions from RGB-D Videos Using a Reject Option , 2013, ICIAP Workshops.

[7]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[8]  R. Venkatesh Babu,et al.  Human action recognition using depth maps , 2012, 2012 International Conference on Signal Processing and Communications (SPCOM).

[9]  Alessia Saggese,et al.  Exploiting the deep learning paradigm for recognizing human actions , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[10]  Shaogang Gong,et al.  Action categorization with modified hidden conditional random field , 2010, Pattern Recognit..

[11]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[13]  Alessia Saggese,et al.  Dynamic Scene Understanding for Behavior Analysis Based on String Kernels , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .