Toward an Intelligent Multimodal Interface for Natural Interaction

Advances in technology are enabling novel approaches to human-computer interaction (HCI) in a wide variety of devices and settings (e.g., the Microsoft@ Surface, the Nintendo@ Wii, iPhone@, etc.). While many of these devices have been commercially successful, the use of multimodal interaction technology is still not well understood from a more principled system design or cognitive science perspective. The long-term goal of our research is to build an intelligent multimodal interface for natural interaction that can serve as a testbed for enabling the formulation of a more principled system design framework for multimodal HCI. This thesis focuses on the gesture input modality. Using a new hand tracking technology capable of tracking 3D hand postures in real-time, we developed a recognition system for continuous natural gestures. By nature gestures, we mean the ones encountered in spontaneous interaction, rather than a set of artificial gestures designed for the convenience of recognition. To date we have achieved 96% accuracy on isolated gesture recognition, and 74% correct rate on continuous gesture recognition with data from different users and twelve gesture classes. We are able to connect the gesture recognition system with Google Earth, enabling gestural control of a 3D map. In particular, users can do 3D tilting of the map using nontouch-based gesture which is more intuitive than touch-based ones. We also did an exploratory user study to observe natural behavior under a urban search and rescue scenario with a large tabletop display. The qualitative results from the study provides us with good starting points for understanding how users naturally gesture, and how to integrate different modalities. This thesis has set the stage for further development towards our long-term goal. Thesis Supervisor: Randall Davis Title: Professor of Electrical Engineering and Computer Science

[1]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[2]  Beth Levy,et al.  Conceptual Representations in Lan-guage Activity and Gesture , 1980 .

[3]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[4]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[5]  A. Lecours,et al.  The Biological foundations of gestures : motor and semiotic aspects , 1986 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Joëlle Coutaz,et al.  Applying the Wizard of Oz Technique to the Study of Multimodal Systems , 1993, EWHCI.

[8]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[9]  Francis K. H. Quek,et al.  Toward a vision-based hand gesture interface , 1994 .

[10]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[11]  Sharon L. Oviatt,et al.  Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[12]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[13]  Belur V. Dasarathy,et al.  Sensor fusion potential exploitation-innovative architectures and illustrative applications , 1997, Proc. IEEE.

[14]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Gerhard Rigoll,et al.  Hidden Markov model based continuous online gesture recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[16]  J. Cassell Computer Vision for Human–Machine Interaction: A Framework for Gesture Generation and Interpretation , 1998 .

[17]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[18]  Philip R. Cohen,et al.  The Efficiency of Multimodal Interaction for a Map-based Task , 2000, ANLP.

[19]  Rajeev Sharma,et al.  Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[20]  Hermann Hienz,et al.  Relevant features for video-based continuous sign language recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[21]  Rajeev Sharma,et al.  Toward Natural Gesture/Speech Control of a Large Display , 2001, EHCI.

[22]  Yoichi Sato,et al.  Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[23]  Mohammed Yeasin,et al.  A real-time framework for natural multimodal interaction with large screen displays , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[24]  Rajeev Sharma,et al.  Designing a human-centered, multimodal GIS interface to support emergency management , 2002, GIS '02.

[25]  Mark Ashdown,et al.  The Escritoire: A Personal Projected Display , 2003, WSCG.

[26]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[27]  Mohammed Yeasin,et al.  Speech-gesture driven multimodal interfaces for crisis management , 2003, Proc. IEEE.

[28]  Philip R. Cohen,et al.  Multimodal interaction under exerted conditions in a natural field setting , 2004, ICMI '04.

[29]  Dmitry B. Goldgof,et al.  Gesture recognition using Bezier curves for visualization navigation from registered 3-D data , 2004, Pattern Recognit..

[30]  Trevor Darrell,et al.  Untethered gesture acquisition and recognition for virtual world manipulation , 2005, Virtual Reality.

[31]  Daniel Schneider,et al.  Rapid Signer Adaptation for Isolated Sign Language Recognition , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[32]  Saul Greenberg,et al.  Multimodal multiplayer tabletop gaming , 2007, CIE.

[33]  Regina Barzilay,et al.  Gesture in automatic discourse processing , 2008 .

[34]  Dan Rosenfeld,et al.  Going beyond the display: a surface technology with an electronically switchable diffuser , 2008, UIST '08.

[35]  Andrew D. Wilson Simulating grasping behavior on an imaging interactive surface , 2009, ITS '09.

[36]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[37]  H. Yanco,et al.  Analysis of natural gestures for controlling robot teams on multi-touch tabletop surfaces , 2009, ITS '09.

[38]  R. Klein,et al.  Efficient Bimanual Symmetric 3 D Manipulation for Markerless Hand-Tracking , 2009 .