Pose sentences: A new representation for action recognition using sequence of pose words

We propose a method for recognizing human actions in videos. Inspired from the recent bag-of-words approaches, we represent actions as documents consisting of words, where a word refers to the pose in a frame. Histogram of oriented gradients (HOG) features are used to describe poses, which are then vector quantized to obtain pose-words. As an alternative to bag-of-words approaches, that only represent actions as a collection of words by discarding the temporal characteristics of actions, we represent videos as ordered sequence of pose-words, that is as pose sentences. Then, string matching techniques are exploited to find the similarity of two action sequences. In the experiments, performed on data set of Blank et al., 92% performance is obtained.

[1]  Yang Wang,et al.  Semi-Latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition , 2007, Workshop on Human Motion.

[2]  Mubarak Shah,et al.  Recognizing human actions , 2005, VSSN@MM.

[3]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Pinar Duygulu Sahin,et al.  Human Action Recognition Using Distribution of Oriented Rectangular Patches , 2007, Workshop on Human Motion.

[5]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Christian Thurau,et al.  Behavior Histograms for Action Recognition and Human Detection , 2007, Workshop on Human Motion.

[8]  Thomas B. Moeslund,et al.  Motion Primitives for Action Recognition , 2007 .

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  Maja J. Mataric,et al.  Deriving action and behavior primitives from human motion data , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Thomas B. Moeslund,et al.  Motion Primitives and Probabilistic Edit Distance for Action Recognition , 2009, Gesture Workshop.

[12]  Fei-FeiLi,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008 .

[13]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[16]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[18]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.