Conditional random field-based gesture recognition with depth information

Abstract. Gesture recognition is useful for human-computer interaction. The difficulty of gesture recognition is that instances of gestures vary both in motion and shape in three-dimensional (3-D) space. We use depth information generated using Microsoft’s Kinect in order to detect 3-D human body components and apply a threshold model with a conditional random field in order to recognize meaningful gestures using continuous motion information. Body gesture recognition is achieved through a framework consisting of two steps. First, a human subject is described by a set of features, encoding the angular relationship between body components in 3-D space. Second, a feature vector is recognized using a threshold model with a conditional random field. In order to show the performance of the proposed method, we use a public data set, the Microsoft Research Cambridge-12 Kinect gesture database. The experimental results demonstrate that the proposed method can efficiently and effectively recognize body gestures automatically.

[1]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ruiduo Yang,et al.  Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Seong-Whan Lee,et al.  Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[5]  Junsong Yuan,et al.  Robust hand gesture recognition with kinect sensor , 2011, ACM Multimedia.

[6]  Stan Sclaroff,et al.  Sign Language Spotting with a Threshold Model Based on Conditional Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ruiduo Yang,et al.  Detecting Coarticulation in Sign Language using Conditional Random Fields , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[10]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Thad Starner,et al.  American sign language recognition with the kinect , 2011, ICMI '11.

[13]  Kanad K. Biswas,et al.  Gesture recognition using Microsoft Kinect® , 2011, The 5th International Conference on Automation, Robotics and Applications.

[14]  Seong-Whan Lee,et al.  Simultaneous spotting of signs and fingerspellings based on hierarchical conditional random fields and boostmap embeddings , 2010, Pattern Recognit..

[15]  程俊,et al.  Feature Fusion for 3D Hand Gesture Recognition by Learning a Shared Hidden Space , 2012 .

[16]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[17]  Seong-Whan Lee,et al.  Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[18]  S. Panchanathan,et al.  1 Documenting Motion Sequences : Development of a Personalized Annotation System , 2004 .

[19]  Mohiuddin Ahmad,et al.  Variable silhouette energy image representations for recognizing human actions , 2010, Image Vis. Comput..