An X-T slice based method for action recognition

This paper proposes a novel method for human action recognition. Different from many action recognition methods which consider an action sequence along the time axis, the proposed method views an action sequence along the space axis. This brings two advantages: the human body structures in all frames are encoded in the feature; the time information is completely used. The process of feature extraction is as follows: first an action sequence is cut into slices parallel to the X-T plane. Every slice, we call X-T slice, is transformed to a mean histogram and a variance histogram along the T axis. Then all mean histograms and all variance histograms are concatenated separately to two vectors, and finally encoded with Mel Frequency Cepstrum Coefficient (MFCC). MFCC, a feature commonly used in speech recognition, can effectively capture changes of 1-D signals over time. The encoded values are sent to classifier for action recognition. Our system achieves very efficient result: it needs only 0.02 second to deal with a frame on average with Matlab.

[1]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tieniu Tan,et al.  Boosted local structured HOG-LBP for object localization , 2011, CVPR 2011.

[4]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Ramakant Nevatia,et al.  Coupled Hidden Semi Markov Models for Activity Recognition , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[8]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  F BobickAaron,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997 .

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Octavia I. Camps,et al.  Activity Recognition from Silhouettes using Linear Systems and Model (In)validation Techniques , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[17]  Martial Hebert,et al.  Spatio-temporal Shape and Flow Correlation for Action Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[20]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[21]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[22]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[23]  D. Gavrila,et al.  3-D model-based tracking of human upper body movement: a multi-view approach , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[24]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[25]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[26]  M. Slaney,et al.  PERCEPTUAL DISTANCE IN TIMBRE SPACE , 2005 .