Multi-signal gesture recognition using temporal smoothing hidden conditional random fields

We present a new approach to multi-signal gesture recognition that attends to simultaneous body and hand movements. The system examines temporal sequences of dual-channel input signals obtained via statistical inference that indicate 3D body pose and hand pose. Learning gesture patterns from these signals can be quite challenging due to the existence of long-range temporal-dependencies and low signal-to-noise ratio (SNR). We incorporate a Gaussian temporal-smoothing kernel into the inference framework, capturing long-range temporal-dependencies and increasing the SNR efficiently. An extensive set of experiments was performed, allowing us to (1) show that combining body and hand signals significantly improves the recognition accuracy; (2) report on which features of body and hands are most informative; and (3) show that using a Gaussian temporal-smoothing significantly improves gesture recognition accuracy.

[1]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[2]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[8]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[9]  F. Althoff,et al.  ROBUST MULTIMODAL HAND-AND HEAD GESTURE RECOGNITION FOR CONTROLLING AUTOMOTIVE INFOTAINMENT SYSTEMS , 2005 .

[10]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Karl-Friedrich Kraiss,et al.  Recent developments in visual sign language recognition , 2008, Universal Access in the Information Society.

[15]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Loïc Kessous,et al.  Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech , 2008, Affect and Emotion in Human-Computer Interaction.

[17]  Ray A. Jarvis,et al.  A multi-modal gesture recognition system in a Human-Robot Interaction scenario , 2009, 2009 IEEE International Workshop on Robotic and Sensors Environments.

[18]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.