Spoken and Multimodal Communication Systems in Mobile Settings

Mobile devices, such as smartphones, have become powerful enough to implement efficient speech-based and multimodal interfaces, and there is an increasing need for such systems. This chapter gives an overview of design and development issues necessary to implement mobile speech-based and multimodal systems. The chapter reviews infrastructure design solutions that make it possible to distribute the user interface between servers and mobile devices, and support user interface migration from server-based to distributed services. An example is given on how an existing server-based spoken timetable application is turned into a multimodal distributed mobile application.

[1]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[2]  Jody J. Daniels,et al.  Listen-Communicate-Show: Spoken Language Command of Agent-based Remote Information Access , 2001, HLT.

[3]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[4]  Mark S. Ackerman,et al.  Impromptu: managing networked audio applications for mobile users , 2004, MobiSys '04.

[5]  Shi-Kuo Chang,et al.  Advances in Software Engineering and Knowledge Engineering , 1993, Series on Software Engineering and Knowledge Engineering.

[6]  Abhishek Kumar,et al.  Adapting dialog call-flows for pervasive devices , 2005, INTERSPEECH.

[7]  Mary Shaw,et al.  An Introduction to Software Architecture , 1993, Advances in Software Engineering and Knowledge Engineering.

[8]  Giuseppe Di Fabbrizio,et al.  Florence: a dialogue manager framework for spoken dialogue systems , 2004, INTERSPEECH.

[9]  S. Kicha Ganapathy,et al.  A synthetic visual environment with hand gesturing and voice input , 1989, CHI '89.

[10]  A BoltRichard,et al.  Put-that-there , 1980 .

[11]  Topi Hurtig,et al.  A mobile multimodal dialogue system for public transportation navigation evaluated , 2006, Mobile HCI.

[12]  Markku Turunen,et al.  Evaluation of a spoken dialogue system with usability tests and long-term pilot studies: similarities and differences , 2006, INTERSPEECH.

[13]  Douglas B. Moran,et al.  The Open Agent Architecture: A Framework for Building Distributed Software Systems , 1999, Appl. Artif. Intell..

[14]  Ari Virtanen,et al.  Mobile Speech-based and Multimodal Public Transport Information Services , 2006 .

[15]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[16]  Douglas K. Barry The Savvy Manager's Guide to Web Services and Service-Oriented Architectures , 2003 .

[17]  Peter D. Holmes,et al.  An architecture for unified dialogue in distributed object systems , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).

[18]  Markku Turunen,et al.  An architecture and applications for speech-based accessibility systems , 2005, IBM Syst. J..

[19]  D. Pearce Enabling new speech driven services for mobile devices: an overview of the proposed etsi standard for a distributed speech recognition front-end , 1999 .

[20]  Alexander G. Hauptmann,et al.  Gestures with Speech for Graphic Manipulation , 1993, Int. J. Man Mach. Stud..

[21]  Rainer Simon,et al.  A generic uiml vocabulary for device- and modality independent user interfaces , 2004, WWW Alt. '04.

[22]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[23]  Alexander I. Rudnicky,et al.  LARRI: A Language-Based Maintenance and Repair Assistant , 2005 .

[24]  P R Cohen,et al.  The role of voice input for human-machine communication. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Philip R. Cohen,et al.  The role of voice in human-machine communication , 1994 .

[26]  Markku Turunen,et al.  Distributed dialogue management for smart terminal devices , 2005, INTERSPEECH.

[27]  Narada D. Warakagoda,et al.  The MUST guide to Paris: Implementation and expert evaluation of a multimodal tourist guide to Paris , 2002 .

[28]  Tsuneo Nitta,et al.  XISL: a language for describing multimodal interaction scenarios , 2003, ICMI '03.

[29]  Jay G. Wilpon,et al.  Voice communication between humans and machines , 1994 .

[30]  Dirk Bühler,et al.  Towards voiceXML compilation for portable embedded applications in ubiquitous environments , 2005, INTERSPEECH.

[31]  Douglas K. Barry,et al.  Web Services and Service-Oriented Architecture: The Savvy Manager's Guide , 2003 .

[32]  Patrick W. Demasco,et al.  Multimodal input for computer access and augmentative communication , 1996, Assets '96.

[33]  J. Schroeter,et al.  Speech and language processing for next-millennium communications services , 2000, Proceedings of the IEEE.