Visual and linguistic information in gesture classification

Classification of natural hand gestures is usually approached by applying pattern recognition to the movements of the hand. However, the gesture categories most frequently cited in the psychology literature are fundamentally multimodal; the definitions make reference to the surrounding linguistic context. We address the question of whether gestures are naturally multimodal, or whether they can be classified from hand-movement data alone. First, we describe an empirical study showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures. Then we present an automatic gesture classification system based solely on an n-gram model of linguistic context; the system is intended to supplement a visual classifier, but achieves 66% accuracy on a three-class classification problem on its own. This represents higher accuracy than human raters achieve when presented with the same information.

[1]  D. McNeill Hand and Mind , 1995 .

[2]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[3]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[4]  Francis K. H. Quek,et al.  Catchments, prosody and discourse , 2001 .

[5]  Francis K. H. Quek,et al.  Hand motion gestural oscillations and multimodal discourse , 2003, ICMI '03.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  Jacob Eisenstein,et al.  Natural gesture in descriptive monologues , 2006, SIGGRAPH Courses.

[8]  Mohammed Yeasin,et al.  Prosody based co-analysis for continuous recognition of coverbal gestures , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[9]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[10]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[11]  Toyoaki Nishida,et al.  Converting Text into Agent Animations: Assigning Gestures to Text , 2004, HLT-NAACL.

[12]  Carlton James Sparrell,et al.  Coverbal iconic gesture in human-computer interaction , 1993 .

[13]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[14]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[15]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[16]  Rajeev Sharma,et al.  Toward Natural Gesture/Speech Control of a Large Display , 2001, EHCI.

[17]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[18]  J. Cassell Computer Vision for Human–Machine Interaction: A Framework for Gesture Generation and Interpretation , 1998 .

[19]  A. Pentland,et al.  Computer Vision for Human–Machine Interaction: A Framework for Gesture Generation and Interpretation , 1998 .