Online human gesture recognition from motion data streams

Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture recognition. However, there are two major challenges: i) how to continuously recognize gestures from unsegmented streams, and ii) how to differentiate different styles of a same gesture from other types of gestures. In this paper, we solve these two problems with a new effective and efficient feature extraction method that uses a dynamic matching approach to construct a feature vector for each frame and improves sensitivity to the features of different gestures and decreases sensitivity to the features of gestures within the same class. Our comprehensive experiments on MSRC-12 Kinect Gesture and MSR-Action3D datasets have demonstrated a superior performance than the stat-of-the-art approaches.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Zi Huang,et al.  Tag localization with spatial correlations and joint group sparsity , 2011, CVPR 2011.

[5]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[6]  Yi Yang,et al.  Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[8]  T D Albright,et al.  Visual motion perception. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ioannis A. Kakadiaris,et al.  Part-based motion descriptor image for human action recognition , 2012, Pattern Recognit..

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Mubarak Shah,et al.  Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Dimitrios Gunopulos,et al.  Embedding-based subsequence matching in time-series databases , 2011, TODS.

[13]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[15]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[17]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Hans-Peter Seidel,et al.  Efficient and Robust Annotation of Motion Capture Data , 2009 .

[19]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[21]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[22]  Nassir Navab,et al.  Manifold Learning for ToF-based Human Body Tracking and Activity Recognition , 2010, BMVC.

[23]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[24]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[25]  Yi-Ping Hung,et al.  Action recognition for human-marionette interaction , 2012, ACM Multimedia.

[26]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[28]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[29]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[30]  Christos Faloutsos,et al.  Stream Monitoring under the Time Warping Distance , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Gérard G. Medioni,et al.  Kernelized Temporal Cut for Online Temporal Segmentation and Recognition , 2012, ECCV.

[33]  Hong Li,et al.  Model-based segmentation and recognition of dynamic gestures in continuous video streams , 2011, Pattern Recognit..

[34]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[35]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[36]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[37]  Nicu Sebe,et al.  Knowledge adaptation for ad hoc multimedia event detection with few exemplars , 2012, ACM Multimedia.

[38]  Yale Song,et al.  Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.