Ongoing human action recognition with motion capture

Ongoing human action recognition is a challenging problem that has many applications, such as video surveillance, patient monitoring, human-computer interaction, etc. This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data. Unlike the after-the-fact classification of completed activities, this work aims at achieving early recognition of ongoing activities. The proposed method is time efficient as it is based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance. The histograms are then compared with the Bhattacharyya distance and warped by a dynamic time warping process to achieve their optimal alignment. This process, implemented by our dynamic programming-based solution, has the advantage of allowing some stretching flexibility to accommodate for possible action length changes. We have shown the success and effectiveness of our solution by testing it on large datasets and comparing it with several state-of-the-art methods. In particular, we were able to achieve excellent recognition rates that have outperformed many well known methods. HighlightsHuman motion interpretation from Motion Capture systems.Histogram of poses analysis.Ongoing activity recognition.

[1]  Hironobu Fujiyoshi,et al.  Real-time human motion analysis by image skeletonization , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[2]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[3]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[4]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[5]  Nikolaos V. Boulgouris,et al.  Human gait recognition based on matching of body components , 2007, Pattern Recognit..

[6]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[7]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[8]  Hans-Peter Seidel,et al.  Stabilizing motion tracking using retrieved motion priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Mohan M. Trivedi,et al.  3D Shape Context Based Gesture Analysis Integrated with Tracking using Omni Video Array , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[12]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Xi,et al.  View Invariant Human Action Recognition Based on Factorization and HMMs , 2009 .

[16]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[17]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[18]  Ioannis A. Kakadiaris,et al.  Part-based motion descriptor image for human action recognition , 2012, Pattern Recognit..

[19]  Philippe Beaudoin,et al.  Motion-motif graphs , 2008, SCA '08.

[20]  Radu Mihnea Udrea,et al.  Iterative generalization of morphological skeleton , 2007, J. Electronic Imaging.

[21]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Rama Chellappa,et al.  Activity Modeling Using Event Probability Sequences , 2008, IEEE Transactions on Image Processing.

[23]  Hans-Peter Seidel,et al.  Efficient and Robust Annotation of Motion Capture Data , 2009 .

[24]  Luc Van Gool,et al.  Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities , 2011, NIPS.

[25]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[26]  Mohiuddin Ahmad,et al.  Variable silhouette energy image representations for recognizing human actions , 2010, Image Vis. Comput..

[27]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  Jing Xiong,et al.  Human Motion Recognition Based on Hidden Markov Models , 2007, ISICA.

[29]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[30]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[31]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[32]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[33]  Mubarak Shah,et al.  A differential geometric approach to representing the human actions , 2008, Comput. Vis. Image Underst..

[34]  Ju-Chin Chen,et al.  Human action recognition based on graph-embedded spatio-temporal subspace , 2012, Pattern Recognit..

[35]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[36]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[38]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[39]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[40]  Wei Liang,et al.  Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[41]  Christian Bauckhage,et al.  Action Recognition in Videos Using Nonnegative Tensor Factorization , 2010, 2010 20th International Conference on Pattern Recognition.

[42]  Maryam Ziaeefard,et al.  Hierarchical Human Action Recognition by Normalized-Polar Histogram , 2010, 2010 20th International Conference on Pattern Recognition.

[43]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[44]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.