A study of manual gesture-based selection for the PEMMI multimodal transport management interface

Operators of traffic control rooms are often required to quickly respond to critical incidents using a complex array of multiple keyboards, mice, very large screen monitors and other peripheral equipment. To support the aim of finding more natural interfaces for this challenging application, this paper presents PEMMI (Perceptually Effective Multimodal Interface), a transport management system control prototype taking video-based manual gesture and speech recognition as inputs. A specific theme within this research is determining the optimum strategy for gesture input in terms of both single-point input selection and suitable multimodal feedback for selection. It has been found that users tend to prefer larger selection areas for targets in gesture interfaces, and tend to select within 44% of this selection radius. The minimum effective size for targets when using 'device-free' gesture interfaces was found to be 80 pixels (on a 1280x1024 screen). This paper also shows that feedback on gesture input via large screens is enhanced by the use of both audio and visual cues to guide the user's multimodal input. Audio feedback in particular was found to improve user response time by an average of 20% over existing gesture selection strategies for multimodal tasks.

[1]  Anurag Gupta An Adaptive Approach to Collecting Multimodal Input , 2003, ACL.

[2]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[3]  Aaron F. Bobick,et al.  Realtime online adaptive gesture recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[4]  Gerald Gazdar,et al.  Natural Language Processing in PROLOG: An Introduction to Computational Linguistics , 1989 .

[5]  Nuria Oliver,et al.  GWindows: robust stereo vision for gesture-based control of windows , 2003, ICMI '03.

[6]  Michel Beaudouin-Lafon,et al.  Charade: remote control of objects using free-hand gestures , 1993, CACM.

[7]  J. Sweller,et al.  Reducing cognitive load by mixing auditory and visual presentation modes , 1995 .

[8]  A BoltRichard,et al.  Put-that-there , 1980 .

[9]  E. Morales,et al.  Automatic Feature Construction and a Simple Rule Induction Algorithm for Skin Detection , 2002 .

[10]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[11]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[12]  Nebojsa Jojic,et al.  Detection and estimation of pointing gestures in dense disparity maps , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[13]  Christine L. MacKenzie,et al.  Pointing on a computer display , 1995, CHI 95 Conference Companion.

[14]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[15]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[16]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[17]  Michael J Kelly PRELIMINARY HUMAN FACTORS GUIDELINES FOR TRAFFIC MANAGEMENT CENTERS , 1999 .

[18]  Rainer Stiefelhagen,et al.  Pointing gesture recognition based on 3D-tracking of face, hands and head orientation , 2003, ICMI '03.

[19]  Julien Epps,et al.  Integration of Speech and Gesture Inputs during Multimodal Interaction , 2004 .

[20]  Catherine Plaisant,et al.  Understanding Transportation Management Systems Performance with a Simulation-Based Learning Environment , 1998 .

[21]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[22]  Georg Michelitsch,et al.  A multimodal presentation planner for a home entertainment environment , 2001, PUI '01.

[23]  Philip R. Cohen,et al.  A map-based system using speech and 3D gestures for pervasive computing , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[24]  Rajeev Sharma,et al.  Experimental evaluation of vision and speech based multimodal interfaces , 2001, PUI '01.

[25]  Bob Carpenter,et al.  The Attribute Logic Engine User's Guide with Trale Extensions , 2003 .

[26]  Sharon L. Oviatt,et al.  When do we interact multimodally?: cognitive load and multimodal communication patterns , 2004, ICMI '04.

[27]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[28]  Philip R. Cohen,et al.  MULTIMODAL INTERFACES THAT PROCESS WHAT COMES NATURALLY , 2000 .