Head gesture recognition in intelligent interfaces: the role of context in improving recognition

Acknowledging an interruption with a nod of the head is a natural and intuitive communication gesture which can be performed without significantly disturbing a primary interface activity. In this paper we describe vision-based head gesture recognition techniques and their use for common user interface commands. We explore two prototype perceptual interface components which use detected head gestures for dialog box confirmation and document browsing, respectively. Tracking is performed using stereo-based alignment, and recognition proceeds using a trained discriminative classifier. An additional context learning component is described, which exploits interface context to obtain robust performance. User studies with prototype recognition components indicate quantitative and qualitative benefits of gesture-based confirmation over conventional alternatives.

[1]  Trevor Darrell,et al.  Fast stereo-based head tracking for interactive environments , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[2]  Ashish Kapoor,et al.  A real-time head nod and shake detector , 2001, PUI '01.

[3]  Lars Bretzner,et al.  Computer Vision Based Hand Gesture Interfaces for Human-Computer Interaction , 2002 .

[4]  Robert J. K. Jacob,et al.  Eye tracking in advanced interface design , 1995 .

[5]  Kentaro Toyama,et al.  “Look, Ma – No Hands!” Hands-Free Cursor Control with Real-Time 3D Face Tracking , 1998 .

[6]  Shinjiro Kawato,et al.  Real-time detection of nodding and head-shaking by directly detecting and tracking the "between-eyes" , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[7]  Rick Kjeldsen,et al.  Head gestures for computer control , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[8]  Justine Cassell,et al.  Relational agents: a model and implementation of building user trust , 2001, CHI.

[9]  Trevor Darrell,et al.  Contextual recognition of head gestures , 2005, ICMI '05.

[10]  Shumin Zhai,et al.  Manual and gaze input cascaded (MAGIC) pointing , 1999, CHI '99.

[11]  Candace L. Sidner,et al.  Engagement by Looking: Behaviors for Robots When Collaborating with People , 2003 .

[12]  T. Kobayashi,et al.  A conversation robot using head gesture recognition as para-linguistic information , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[13]  Candace L. Sidner,et al.  Where to look: a study of human-robot engagement , 2004, IUI '04.

[14]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[15]  Charles J. Cohen,et al.  A basic hand gesture control system for PC applications , 2001, Proceedings 30th Applied Imagery Pattern Recognition Workshop (AIPR 2001). Analysis and Understanding of Time Varying Imagery.

[16]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[17]  Candace L. Sidner,et al.  COLLAGEN: Applying Collaborative Discourse Theory to Human-Computer Interaction , 2001, AI Mag..

[18]  Trevor Darrell,et al.  Adaptive view-based appearance models , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Yukiko I. Nakano,et al.  Towards a Model of Face-to-Face Grounding , 2003, ACL.

[20]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[21]  James W. Davis,et al.  A perceptual user interface for recognizing head gesture acknowledgements , 2001, PUI '01.