Manitou: A multimodal interaction platform

Multimodal human-computer interaction combining multiple input modalities is the key for permitting our highly skilled and coordinated communicative behavior to control computer systems in flexible and natural manner. In this paper a multimodal interaction platform, called Manitou, is presented. The platform empowers researchers and developers to employ speech, gestures and other modalities in applications, enabling multimodal interaction. Key features of the platform are exposed to the web environment to facilitate a fast prototyping and delivering applications, exploiting multimodal interfaces. Benefits of the platform have been proven by a sample application and subsequent user experience survey.

[1]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[2]  Sharon L. Oviatt,et al.  Perceptual user interfaces: multimodal interfaces that process what comes naturally , 2000, CACM.

[3]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[4]  Michael Johnston,et al.  Finite-state multimodal integration and understanding , 2005, Natural Language Engineering.

[5]  Jean-Yves Lionel Lawson,et al.  The openinterface framework: a tool for multimodal interaction. , 2008, CHI Extended Abstracts.

[6]  Kuansan Wang,et al.  SALT: An XML Application for Web-based Multimodal Dialog Management , 2002, NLPXML@COLING.

[7]  Jirí Zára,et al.  3D Talking-Head Interface to Voice-Interactive Services on Mobile Phones , 2011, Int. J. Mob. Hum. Comput. Interact..

[8]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[9]  Beat Signer,et al.  Mudra: a unified multimodal interaction framework , 2011, ICMI '11.

[10]  Roman Rädle,et al.  Squidy: a zoomable design environment for natural user interfaces , 2009, CHI Extended Abstracts.

[11]  Ian McGraw,et al.  The WAMI toolkit for developing, deploying, and evaluating web-accessible multimodal interfaces , 2008, ICMI '08.

[12]  Marc Erich Latoschik Designing transition networks for multimodal VR-interactions using a markup language , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[13]  Giuseppe Di Fabbrizio,et al.  A speech mashup framework for multimodal mobile services , 2009, ICMI-MLMI '09.

[14]  Michael Johnston,et al.  Building multimodal applications with EMMA , 2009, ICMI-MLMI '09.

[15]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.