Real-Time Gesture Recognition from Depth Data through Key Poses Learning and Decision Forests

Human gesture recognition is a challenging task with many applications. The popularization of real time depth sensors even diversifies potential applications to end-user natural user interface (NUI). The quality of such NUI highly depends on the robustness and execution speed of the gesture recognition. This work introduces a method for real-time gesture recognition from a noisy skeleton stream, such as the ones extracted from Kinect depth sensors. Each pose is described using a tailored angular representation of the skeleton joints. Those descriptors serve to identify key poses through a multi-class classifier derived from Support Vector learning machines. The gesture is labeled on-the-fly from the key pose sequence through a decision forest, that naturally performs the gesture time warping and avoids the requirement for an initial or neutral pose. The proposed method runs in real time and shows robustness in several experiments.

[1]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[3]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  GleicherMichael,et al.  Automated extraction and parameterization of motions in large data sets , 2004 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[7]  Hans-Peter Seidel,et al.  Efficient and Robust Annotation of Motion Capture Data , 2009 .

[8]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[9]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[10]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[11]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[12]  Eugene Fiume,et al.  An efficient search algorithm for motion data using weighted PCA , 2005, SCA '05.

[13]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[14]  James W. Davis,et al.  Minimal-latency human action recognition using reliable-inference , 2006, Image Vis. Comput..

[15]  Sergio Escalera,et al.  Featureweighting in dynamic timewarping for gesture recognition in depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[16]  Sheng-Wen Shih,et al.  Human Action Recognition Using 2-D Spatio-Temporal Templates , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[17]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[18]  Michael Gleicher,et al.  Automated extraction and parameterization of motions in large data sets , 2004, SIGGRAPH 2004.

[19]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[21]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[26]  Shaogang Gong,et al.  Action categorization with modified hidden conditional random field , 2010, Pattern Recognit..

[27]  Luiz Velho,et al.  Learning good views through intelligent galleries , 2009, Comput. Graph. Forum.