Design of multimodal dialogue-based systems

Multimodal dialogue systems integrate advanced (often spoken) language technologies within human-computer interaction methods. Such complex systems cannot be designed without extensive human expertise and systematic design guidelines taking into account the limitations of the underlying technologies. Therefore, this thesis aims at reducing the time and effort needed to build such systems by creating application-independent techniques, tools and algorithms that automate the design process and make it accessible for non-expert application developers. The thesis proposes an interactive system prototyping methodology, which (together with its software implementation) allows for rapid building of multimodal dialogue-based information seeking systems. When designed with our methodology, even partially implemented system prototypes can immediately be tested with users through Wizard of Oz simulations (which are integrated into the methodology) that reveal user behavior and modality use models. Involving users in early development phases increases the chances for the targeted system to be well accepted by end-users. With respect to dialogue system design, we propose a two-layered dialogue model as a variant of the standard frame-based approach. The two layers of the proposed dialogue model correspond to local and global dialogue strategies. One of the important findings of our research is that the two-layered dialogue model is easily extendable to multimodal systems. The methodology is illustrated in full detail through the design and implementation of the Archivus system – a multimodal (mouse, pen, touchscreen, keyboard and voice) interface that allows users to access and search a database of recorded and annotated meetings (the Smart Meeting Room application). The final part of the thesis is dedicated to an overall qualitative evaluation of the Archivus system (user's performance, satisfaction, analysis of encountered problems) and to a quantitative evaluation of all the implemented dialogue strategies. Our methodology is intended (1) for designers of multimodal systems who want to quickly develop a multimodal system in their application domain, (2) for researchers who want to better understand human-machine multimodal interaction through experimenting with working prototypes, (3) for researchers who want to test new modalities within the context of a complete application, and (4) for researchers interested in new approaches to specific issues related to multimodal systems (e.g. the multimodal fusion problem).

[1]  Michael K. Brown,et al.  Development Principles for Dialog-Based Interfaces , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[2]  Hank Liao,et al.  Multimodal Fusion , 2009, Encyclopedia of Biometrics.

[3]  Joëlle Coutaz,et al.  A generic platform for addressing the multimodal challenge , 1995, CHI '95.

[4]  Norbert Pfleger,et al.  Context based multimodal fusion , 2004, ICMI '04.

[5]  Kathryn M. Dobroth,et al.  Automating Services with Speech Recognition over the Public Switched Telephone Network: Human Factors Considerations , 1991, IEEE J. Sel. Areas Commun..

[6]  James A. Larson,et al.  Guidelines for multimodal user interface design , 2004, CACM.

[7]  Martin Rajman,et al.  Wizard-of-Oz tests for a dialog system in smart homes , 2004 .

[8]  Sebastian Möller,et al.  INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control , 2004, LREC.

[9]  Agnes Lisowska Masson,et al.  Multimodal interface design for multimodal meeting content retrieval , 2004, ICMI '04.

[10]  Salim Roukos,et al.  A Flexible Framework for Developing Mixed-Initiative Dialog Systems , 2002, SIGDIAL Workshop.

[11]  Emiel Krahmer,et al.  Error Detection in Spoken Human-Machine Interaction , 2001, Int. J. Speech Technol..

[12]  Agnes Lisowska,et al.  Multimodal interface design for the multimodal meeting domain: Preliminary indications from a query , 2003 .

[13]  Kristiina Jokinen,et al.  Multimodality – technology , visions and demands for the future , 2003 .

[14]  Jeanne C. Fromer,et al.  Learning optimal discourse strategies in a spoken dialogue system , 1998 .

[15]  Victor Zue,et al.  WHEELS: a conversational system in the automobile classifieds domain , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[17]  Shrikanth S. Narayanan,et al.  SPOKEN LANGUAGE DIALOGUE : FROM THEORY TO PRACTICE , 1999 .

[18]  G. Milligan,et al.  Contents , 2020, Primary Care Diabetes.

[19]  G. Veldhuijzen van Zanten Adaptive mixed-initiative dialogue management , 1998 .

[20]  Manny Rayner,et al.  Abductive Equivalential Translation and its application to Natural Language Database Interfacing , 1994, ArXiv.

[21]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[22]  Bernd Ludwig,et al.  Automaton-Descriptions and Theorem-Proving: A Marriage made in Heaven? , 1999 .

[23]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[24]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[25]  Miroslav Melichar,et al.  From vocal to multimodal dialogue management , 2006, ICMI '06.

[26]  Andreas Kellner,et al.  PADIS - An automatic telephone switchboard and directory information system , 1997, Speech Communication.

[27]  Harald Aust,et al.  An overview of the Philips dialog system , 1998 .

[28]  James F. Allen,et al.  A Plan Recognition Model for Subdialogues in Conversations , 1987, Cogn. Sci..

[29]  Allen L. Gorin,et al.  Spoken language understanding within dialogs using a graphical model of task structure , 1998, ICSLP.

[30]  Anoop K. Sinha,et al.  Multimodal theater: extending low fidelity paper prototyping to multimodal applications , 2002, CHI Extended Abstracts.

[31]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[32]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[33]  Arne Jönsson,et al.  AN ARCHITECTURE FOR MULTI-MODAL NATURAL DIALOGUE SYSTEMS , 2000 .

[34]  Martin Rajman,et al.  Turn on the lights: investigating the Inspire voice controlled smart home system , 2004 .

[35]  Agnes Lisowska,et al.  Multimodal interface design for multimodal meeting content retrieval , 2004 .

[36]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[37]  Joseph Polifroni,et al.  A form-based dialogue manager for spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[38]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[39]  James Alan Larson,et al.  Interactive Software: Tools for Building Interactive User Interface , 1991 .

[40]  S. Quarteroni,et al.  Introducing reset patterns: An extension to a Rapid Dialogue Prototyping Methodology , 2005 .

[41]  Michel Généreux,et al.  Evaluating Multi-modal Input Modes in a Wizard-of-Oz Study for the Domain of Web Search , 2001, BCS HCI/IHM.

[42]  Roberto Pieraccini,et al.  AMICA: the AT&t mixed initiative conversational architecture , 1997, EUROSPEECH.

[43]  Xiao-Jun Wu,et al.  A hybrid dialogue management approach for a flight spoken dialogue system , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[44]  Martin Rajman,et al.  Assessing the usability of a dialogue management system designed in the framework of a rapid dialogu , 2003 .

[45]  Victor Zue,et al.  Mokusei: a telephone-based Japanese conversational system in the weather domain , 2001, INTERSPEECH.

[46]  Wayne H. Ward,et al.  THE CU COMMUNICATOR SYSTEM 1 , 1999 .

[47]  Csr Young,et al.  How to Do Things With Words , 2009 .

[48]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[49]  Andrei Popescu-Belis,et al.  User Query Analysis for the Specification and Evaluation of a Dialogue Processing and Retrieval System , 2004, LREC.

[50]  Nicole Shechtman,et al.  Media inequality in conversation: how people behave differently when interacting with computers and people , 2003, CHI '03.

[51]  Joëlle Coutaz,et al.  Applying the Wizard of Oz Technique to the Study of Multimodal Systems , 1993, EWHCI.

[52]  Lori Lamel,et al.  The LIMSI ARISE system , 2000, Speech Commun..

[53]  Thierry Dutoit,et al.  Aided design of finite-state dialogue management systems , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[54]  W. Wahlster,et al.  Dialogue-based user models , 1986, Proceedings of the IEEE.

[55]  Joseph Polifroni,et al.  A new restaurant guide conversational system: issues in rapid prototyping for specialized domains , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[56]  Hong-Kwang Jeff Kuo,et al.  Dialogue management in the Bell Labs communicator system , 2000, INTERSPEECH.

[57]  A BoltRichard,et al.  Put-that-there , 1980 .

[58]  Arne Jönsson,et al.  Talking to a Computer Is Not like Talking to Your Best Friend , 1988, SCAI.

[59]  Ronnie Taib,et al.  Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations , 2007, HCI.

[60]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[61]  Martin Rajman,et al.  Prototypage rapide et évaluation de modèles de dialogue finalisés , 2003, JEPTALNRECITAL.

[62]  J. Sadock Speech acts , 2007 .

[63]  Linda A. Macaulay,et al.  Proceedings of the fifth conference of the British Computer Society, Human-Computer Interaction Specialist Group on People and computers V , 1990 .

[64]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[65]  Dan Diaper,et al.  The Wizard's Apprentice: a program to help analyse natural language dialogues , 1990 .

[66]  Julia Hirschberg,et al.  Prosodic cues to recognition errors , 1999 .

[67]  Marilyn A. Walker,et al.  Developing and Testing General Models of Spoken Dialogue System Peformance , 2000, LREC.

[68]  Yonghong Yan,et al.  Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[70]  Martin Rajman,et al.  Rapid Multimodal Dialogue Design: Application in a Multimodal Meeting Retrieval and Browsing System , 2005 .

[72]  Laila Dybkjær,et al.  Spoken Multimodal Human-Computer Dialogue in Mobile Environments , 2005 .

[73]  Kent Lyons,et al.  Providing support for mobile calendaring conversations: a wizard of oz evaluation of dual--purpose speech , 2005, Mobile HCI.

[74]  Michael F. McTear,et al.  Integrating flexibility into a structured dialogue model: some design considerations , 2000, INTERSPEECH.

[75]  Ronald A. Cole,et al.  A laboratory course for designing and testing spoken dialogue systems , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[76]  Roland Reagan THE CU COMMUNICATOR SYSTEM , 1998 .

[77]  Stanley Peters,et al.  A wizard of oz framework for collecting spoken human-computer dialogs , 2004, INTERSPEECH.

[78]  Yacine Bellik,et al.  A framework to manage multimodal fusion of events for advanced interactions within Virtual Environments , 2002, EGVE.

[79]  Phil Cohen,et al.  Dialogue modeling , 1997 .

[80]  Martin Rajman,et al.  ARCHIVUS: A System for Accessing the Content of Recorded Multimodal Meetings , 2004, MLMI.

[81]  João Paulo da Silva Neto,et al.  Meteo: A Telephone-Based Portuguese Conversation System in Weather Domain , 2002, PorTAL.

[82]  M. Eklund,et al.  PEER REVIEWED PAPERS , 2003 .

[83]  John H. L. Hansen,et al.  University of Colorado Dialogue Systems for Travel and Navigation , 2001, HLT.

[84]  Fabio Paternò,et al.  CTTE: Support for Developing and Analyzing Task Models for Interactive System Design , 2002, IEEE Trans. Software Eng..

[85]  Ben Shneiderman,et al.  Natural Vs. Precise Concise Languages for Human Operation of Computers: Research Issues and Experimental Approaches , 1980, ACL.

[86]  L. Boves,et al.  A SPOKEN DIALOGUE SYSTEM FOR PUBLIC TRANSPORT INFORMATION , 1995 .

[87]  Martin Rajman,et al.  Minimizing modality bias when exploring input preferences for multimodal systems in new domains: the archivus case study , 2007, CHI Extended Abstracts.

[88]  Alexander I. Rudnicky,et al.  LARRI: A Language-Based Maintenance and Repair Assistant , 2005 .

[89]  Andrei Popescu-Belis,et al.  Towards an Objective Test for Meeting Browsers: The BET4TQB Pilot Experiment , 2007, MLMI.

[90]  Alexander I. Rudnicky Multimodal Dialogue Systems , 2005 .

[91]  Douglas B. Moran,et al.  The Open Agent Architecture: A Framework for Building Distributed Software Systems , 1999, Appl. Artif. Intell..

[92]  Hauke Schramm,et al.  The thoughtful elephant: strategies for spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[93]  Alexander G. Hauptmann,et al.  Speech and gestures for graphic image manipulation , 1989, CHI '89.

[94]  Martin Rajman,et al.  A Framework for Rapid Multimodal Application Design , 2005, TSD.

[95]  W. Buxton Human-Computer Interaction , 1988, Springer Berlin Heidelberg.

[96]  J. V. Kuppevelt,et al.  Advances in natural multimodal dialogue systems , 2005 .

[97]  Mary Beth Rosson,et al.  Scenario-based design , 2002 .

[98]  Tomek Strzalkowski,et al.  Data-Driven Strategies for an Automated Dialogue System , 2004, ACL.

[99]  Wayne H. Ward,et al.  The CU communicator: an architecture for dialogue systems , 2000, INTERSPEECH.

[100]  Pat Langley,et al.  An Adaptive Conversational Interface for Destination Advice , 1999, CIA.

[101]  Xiaojun Wu,et al.  Topic Forest: a plan-based dialog management structure , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[102]  Alex Waibel,et al.  A framework and toolkit for the construction of multimodal learning interfaces , 1998 .

[103]  David Portabella-Clotet Improving user confidence in decision support systems for electronic catalogs , 2008 .

[104]  Martin Rajman,et al.  Extending the Wizard of Oz Methodology for Language-enabled Multimodal Systems , 2006 .

[105]  Gregory D. Abowd,et al.  Human-computer interaction (2nd ed.) , 1998 .

[106]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[107]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[108]  Ivan Kopecek Modeling of the Information Retrieval Dialogue Systems , 1999, TSD.

[109]  Martin Rajman,et al.  Archivus: A Multimodal System for Multimedia Meeting Browsing and Retrieval , 2006, ACL.

[110]  Fabio Paternò,et al.  Design and development of multidevice user interfaces through multiple logical descriptions , 2004, IEEE Transactions on Software Engineering.

[111]  Anna Maria Di Sciullo,et al.  Natural Language Understanding , 2009, SoMeT.

[112]  Sharon Oviatt,et al.  Multimodal interactive maps: designing for human performance , 1997 .

[113]  Petra Geutner,et al.  Design of the VICO Spoken Dialogue System: Evaluation of User Expectations by Wizard-of-Oz Experiments , 2002, LREC.

[114]  T. T. Soong,et al.  State-of-the-art review: Active structural control in civil engineering , 1988 .

[115]  Martin Rajman,et al.  Rapid Dialogue Prototyping Methodology , 2004, TSD.

[116]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[117]  Joëlle Coutaz,et al.  A Wizard of Oz platform for the study of multimodal systems , 1993, INTERCHI Adjunct Proceedings.