Simplified Training for Gesture Recognition

Since gesture is a fundamental form of human communication, its recognition by a computer generates a strong interest for many applications such as natural user interface and gaming. The popularization of real-time depth sensors brings such applications to the public at large. However, familiar gestures are culture-specific, and their automatic recognition must therefore result from a machine learning process. In particular this requires either teaching the user how to communicate with the machine, such as for popular mobile devices or gaming consoles, or tailoring the application to a specific public. The latter option serves a large number of applications such as sport monitoring, virtual reality or surveillance -- although it requires a usually tedious training. This work intends to simplify the training required by gesture recognition methods. While the traditional procedure uses a set of key poses, which must be defined and trained, prior to a set of gestures that must also be defined and trained, we propose to automatically deduce the set of key poses from the gesture training. We represent a record of gestures as a curve in high dimension and robustly segment it according to the curvature of that curve. Then we use an asymmetric Hausdorff distance between gestures to define a discriminant key pose as the most distant pose between gestures. This further allows to dynamically group gestures by similarity. The training only requires the user to perform the gestures and eventually refine the gesture labeling. The generated set of key poses and gestures then fits in previous human action recognition algorithms. Furthermore, this semi-supervised learning allows re-using a previous training to extend the set of gestures the computer should be able to recognize. Experimental results show that the automatically generated discriminant key poses lead to similar recognition accuracy as previous work.

[1]  Michael Gleicher,et al.  Automated extraction and parameterization of motions in large data sets , 2004, SIGGRAPH 2004.

[2]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[4]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  D. Morris Gestures, Their Origins and Distribution. , 1979 .

[7]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[9]  Hans-Peter Seidel,et al.  Efficient and Robust Annotation of Motion Capture Data , 2009 .

[10]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[14]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[15]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[17]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[18]  T. Banchoff,et al.  Differential Geometry of Curves and Surfaces , 2010 .

[19]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[20]  Guodong Guo,et al.  Evaluating spatiotemporal interest point features for depth-based action recognition , 2014, Image Vis. Comput..

[21]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Mario Fernando Montenegro Campos,et al.  Distance matrices as invariant features for classifying MoCap data , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Mario Fernando Montenegro Campos,et al.  Online gesture recognition from pose kernel learning and decision forests , 2014, Pattern Recognit. Lett..

[24]  Sheng-Wen Shih,et al.  Human Action Recognition Using 2-D Spatio-Temporal Templates , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[25]  Mario Fernando Montenegro Campos,et al.  On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns , 2014, Pattern Recognit. Lett..

[26]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[27]  Eugene Fiume,et al.  An efficient search algorithm for motion data using weighted PCA , 2005, SCA '05.

[28]  Thomas Lewiner,et al.  Curvature and torsion estimators based on parametric curve fitting , 2005, Comput. Graph..