Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost

Our goal is to automatically segment and recognize basic human actions, such as stand, walk and wave hands, from a sequence of joint positions or pose angles. Such recognition is difficult due to high dimensionality of the data and large spatial and temporal variations in the same action. We decompose the high dimensional 3-D joint space into a set of feature spaces where each feature corresponds to the motion of a single joint or combination of related multiple joints. For each feature, the dynamics of each action class is learned with one HMM. Given a sequence, the observation probability is computed in each HMM and a weak classifier for that feature is formed based on those probabilities. The weak classifiers with strong discriminative power are then combined by the Multi-Class AdaBoost (AdaBoost.M2) algorithm. A dynamic programming algorithm is applied to segment and recognize actions simultaneously. Results of recognizing 22 actions on a large number of motion capture sequences as well as several annotated and automatically tracked sequences show the effectiveness of the proposed algorithms.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[6]  James W. Davis,et al.  Action Recognition Using Temporal Templates , 1997 .

[7]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[12]  James M. Rehg,et al.  Asymmetrically boosted HMM for speech reading , 2004, CVPR 2004.

[13]  Ramakant Nevatia,et al.  3D Human Action Recognition Using Spatio-temporal Motion Templates , 2005, ICCV-HCI.

[14]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Ramakant Nevatia,et al.  Dynamic Human Pose Estimation using Markov Chain Monte Carlo Approach , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.