Continuous Human Action Segmentation and Recognition Using a Spatio-Temporal Probabilistic Framework

In this paper, a framework of automatic human action segmentation and recognition in continuous action sequences is proposed. A star-like figure is proposed to effectively represent the extremities in the silhouette of human body. The human action, thus, is recorded as a sequence of the star-like figure parameters, which is used for action modeling. To model human actions in a compact manner while characterizing their spatio-temporal distributions, star-like figure parameters are represented by Gaussian mixture models (GMM). In addition, to address the intrinsic nature of temporal variations in a continuous action sequence, we transform the time sequence of star-like figure parameters into frequency domain by discrete cosine transform (DCT) and use only the first few coefficients to represent different temporal patterns with significant discriminating power. The performance shows that the proposed framework can recognize continuous human actions in an efficient way

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Ferran Marqués,et al.  Silhouette-based probabilistic 2D human motion estimation for real-time applications , 2005, IEEE International Conference on Image Processing 2005.

[3]  Yong Rui,et al.  Segmenting visual actions based on spatio-temporal motion patterns , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[5]  Hagai Aronowitz,et al.  A distance measure between GMMs based on the unscented transform and its application to speaker recognition , 2005, INTERSPEECH.

[6]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[7]  Rita Cucchiara,et al.  Probabilistic posture classification for Human-behavior analysis , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[8]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[9]  Hironobu Fujiyoshi,et al.  Real-time human motion analysis by image skeletonization , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[10]  Ramakant Nevatia,et al.  Segmentation and tracking of multiple humans in complex situations , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Monique Thonnat,et al.  Human Posture Recognition in Video Sequence , 2003 .

[12]  Jake K. Aggarwal,et al.  Segmentation and recognition of continuous human activity , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[13]  Larry S. Davis,et al.  Ghost: a human body part labeling system using silhouettes , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[14]  Rama Chellappa,et al.  Key Frame-Based Activity Representation Using Antieigenvalues , 2006, ACCV.

[15]  I-Cheng Chang,et al.  The model-based human body motion analysis system , 2000, Image Vis. Comput..

[16]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..