Conversational interfaces: advances and challenges

The past decade has witnessed the emergence of a new breed of human-computer interfaces that combines several human language technologies to enable humans to converse with computers using spoken dialogue for information access, creation and processing. In this paper, we introduce the nature of these conversational interfaces and describe the underlying human language technologies on which they are based. After summarizing some of the recent progress in this area around the world, we discuss development issues faced by researchers creating these kinds of systems and present some of the ongoing and unmet research challenges in this field.

[1]  David Goddeau,et al.  Using probabilistic shift-reduce parsing in speech recognition systems , 1992, ICSLP.

[2]  Hauke Schramm,et al.  The thoughtful elephant: strategies for spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[3]  Richard M. Schwartz,et al.  Statistical Language Processing Using Hidden Understanding Models , 1994, HLT.

[4]  Chung Hee Hwang,et al.  The TRAINS project: a case study in building a conversational planning agent , 1994, J. Exp. Theor. Artif. Intell..

[5]  Douglas E. Appelt,et al.  A Template Matcher for Robust NL Interpretation , 1991, HLT.

[6]  Robert Dale,et al.  Building Natural Language Generation Systems: Figures , 2000 .

[7]  Lori Lamel,et al.  Design strategies for spoken language dialog systems , 1999, 6th European Conference on Speech Communication and Technology (Eurospeech 1999).

[8]  Victor Zue,et al.  GALAXY: a human-language interface to on-line travel information , 1994, ICSLP.

[9]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[10]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[11]  Lou Boves,et al.  Dialogue management in the dutch ARISE train timetable information system , 1999, EUROSPEECH.

[12]  Sheri Hunnicutt,et al.  An experimental dialog system: WAXHOLM , 1993 .

[13]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[14]  Sheri Hunnicutt,et al.  An experimental dialogue system: waxholm , 1993, EUROSPEECH.

[15]  L. Boves,et al.  Evaluation of the Dutch train timetable information system developed in the ARISE project , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[16]  Stephanie Sene Robust Parsing for Spoken Language Systems , 1992 .

[17]  Alexander H. Waibel,et al.  Dialogue strategies guiding users to their communicative goals , 1997, EUROSPEECH.

[18]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[19]  Victor Zue,et al.  YINHE: a Mandarin Chinese version of the GALAXY system , 1997, EUROSPEECH.

[20]  Giovanni Flammia,et al.  Discourse segmentation of spoken dialogue: an empirical approach , 1998 .

[21]  Joseph Polifroni,et al.  Multimodal discourse modelling in a multi-user multi-domain environment , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22]  Niels Ole Bernsen,et al.  Cooperativity in human‐machine and human‐human spoken dialogue , 1996 .

[23]  Hauke Schramm,et al.  Using combined decisions and confidence measures for name recognition in automatic directory assistance systems , 1998, ICSLP.

[24]  Hy Murveit,et al.  Spontaneous Speech Effects In Large Vocabulary Speech Recognition Applications , 1992, HLT.

[25]  Elmar Nöth,et al.  On the use of prosody in automatic dialogue understanding , 2002, Speech Commun..

[26]  Joseph Polifroni,et al.  A form-based dialogue manager for spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[28]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[29]  William J. Byrne,et al.  Rapid speech recognizer adaptation to new speakers , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[30]  David Stallard,et al.  Fragment Processing in the DELPHI System , 1992, HLT.

[31]  Jonathan G. Fiscus,et al.  Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[32]  Victor Zue Navigating the Information Superhighway Using Spoken Language Interfaces , 1995, IEEE Expert.

[33]  David L. Thomson,et al.  User Confusion in Natural Language Services , 2000 .

[34]  David Goodine,et al.  Full integration of speech and language understanding in the MIT spoken language system , 1991, EUROSPEECH.

[35]  Victor Zue,et al.  Multilingual spoken-language understanding in the MIT Voyager system , 1995, Speech Commun..

[36]  Pascale Fung,et al.  The BBN/HARC spoken language understanding system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Victor Zue,et al.  New words: implications for continuous speech recognition , 1993, EUROSPEECH.

[38]  Shimei Pan,et al.  Spoken language generation in a multimedia system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[39]  Lori Lamel,et al.  The LIMSI ARISE system , 2000, Speech Commun..

[40]  Mari Ostendorf,et al.  SABLE: a standard for TTS markup , 1998, ICSLP.

[41]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[42]  Yoshinori Sagisaka,et al.  ATR μ-talk speech synthesis system , 1992, ICSLP.

[43]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[44]  Sheri Hunnicutt,et al.  Generic and domain-specific aspects of the Waxholm NLP and dialog modules , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[45]  James Glass,et al.  Evaluation methodology for a telephone-based conversational system , 1998 .

[46]  F. Canavesio,et al.  Automation of Telecom Italia directory assistance service: field trial results , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[47]  Jean-Luc Gauvain,et al.  User evaluation of the MASK kiosk , 1998, Speech Commun..

[48]  Julia Hirschberg,et al.  Communication and prosody: Functional aspects of prosody , 2002, Speech Commun..

[49]  Salim Roukos,et al.  Maximum likelihood and discriminative training of direct translation models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[50]  D. Massaro Perceiving talking faces: from speech perception to a behavioral principle , 1999 .

[51]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[52]  Joseph Polifroni,et al.  Galaxy-II as an Architecture for Spoken Dialogue Evaluation , 2000, LREC.

[53]  Lynette Hirschman,et al.  Multi-Site Data Collection for a Spoken Language Corpus , 1992, HLT.

[54]  Shimei Pan,et al.  Language Generation for Multimedia Healthcare Briefings , 1997, ANLP.

[55]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[56]  Jeremy Peckham,et al.  A new generation of spoken dialogue systems: results and lessons from the sundial project , 1993, EUROSPEECH.

[57]  Alex Acero,et al.  Whistler: a trainable text-to-speech system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[58]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[59]  Morena Danieli,et al.  Field trials of the Italian ARISE train timetable system , 2000, Speech Commun..

[60]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[61]  Justine Cassell,et al.  Embodied Conversation: Integrating Face and Gesture into Automatic Spoken Dialogue Systems , 1998 .

[62]  James R. Glass,et al.  Natural-sounding speech synthesis using variable-length units , 1998, ICSLP.

[63]  Victor Zue,et al.  From interface to content: translingual access and delivery of on-line information , 1997, EUROSPEECH.

[64]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[65]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[66]  Sharon L. Oviatt,et al.  The efficiency of multimodal interaction: a case study , 1998, ICSLP.

[67]  Richard R. Rosinski,et al.  Prompt constrained natural language-evolving the next generation of telephony services , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[68]  Victor Zue,et al.  WHEELS: a conversational system in the automobile classifieds domain , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[69]  James R. Glass,et al.  Confidence scoring for speech understanding systems , 1998, ICSLP.

[70]  Lou Boves,et al.  Overview of the ARISE project , 1999, EUROSPEECH.

[71]  Douglas E. Appelt,et al.  Combining Linguistic and Statistical Knowledge Sources in Natural-Language Processing for ATIS , 1995 .

[72]  Alexander I. Rudnicky,et al.  A schema based approach to dialog control , 1998, ICSLP.

[73]  Jacqueline C. Kowtko,et al.  Data Collection and Analysis in the Air Travel Planning Domain , 1989, HLT.

[74]  David Stallard,et al.  Syntactic and Semantic Knowledge in the DELPHI Unification Grammar , 1990, HLT.

[75]  Richard M. Schwartz,et al.  The N-Best Algorithm: Efficient Procedure for Finding Top N Sentence Hypotheses , 1989, HLT.

[76]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[77]  David Sadek,et al.  Design Considerations on Dialogue Systems: From Theory to Technology - The Case of Artimis - , 2000 .

[78]  Shrikanth S. Narayanan,et al.  VPQ: a spoken language interface to large scale directory information , 1998, ICSLP.

[79]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[80]  Wayne H. Ward,et al.  Modelling Non-verbal Sounds for Speech Recognition , 1989, HLT.

[81]  Thomas Kuhn,et al.  A spoken dialogue system for German intercity train timetable inquiries , 1993, EUROSPEECH.

[82]  Joseph Polifroni,et al.  A new restaurant guide conversational system: issues in rapid prototyping for specialized domains , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[83]  J. Makhoul,et al.  Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[84]  Victor Zue,et al.  Webgalaxy - integrating spoken language and hypertext navigation , 1997, EUROSPEECH.

[85]  L. Lamel,et al.  The LIMSI ARISE system [rail travel information system] , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[86]  Roberto Pieraccini,et al.  Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[87]  Roberto Pieraccini,et al.  AMICA: the AT&t mixed initiative conversational architecture , 1997, EUROSPEECH.

[88]  James R. Glass,et al.  Multilingual language generation across multiple domains , 1994, ICSLP.

[89]  Yasuharu Den,et al.  Prosody-based detection of the context of backchannel responses , 1998, ICSLP.

[90]  Masanobu Abe,et al.  Report on the Third ESCA TTS Workshop evaluation procedure , 1998, SSW.

[91]  Alexander I. Rudnicky,et al.  Spoken language recognition in an office management domain , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[92]  Chin-Hui Lee,et al.  Stochastic Representation of Conceptual Structure in the ATIS Task , 1991, HLT.

[93]  L.W.J. Boves,et al.  Applications of Speech Technology: Designing for Usability , 1999 .

[94]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[95]  Alexander I. Rudnicky,et al.  Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..

[96]  Alexander I. Rudnicky,et al.  Evaluating spoken language interaction , 1989, HLT.

[97]  Wayne H. Ward,et al.  The CMU Air Travel Information Service: Understanding Spontaneous Speech , 1990, HLT.

[98]  Maxine Eskénazi,et al.  Data collection and processing in the carnegie mellon communicator , 1999, EUROSPEECH.

[99]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[100]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[101]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[102]  Wayne H. Ward,et al.  Integrating semantic constraints into the Sphinx-II recognition search , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[103]  Nigel Ward,et al.  Using prosodic clues to decide when to produce back-channel utterances , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[104]  Douglas E. Appelt,et al.  GEMINI: A Natural Language System for Spoken-Language Understanding , 1993, ACL.

[105]  Ronald A. Cole,et al.  Bringing spoken language systems to the classroom , 1997, EUROSPEECH.

[106]  Sharon L. Oviatt,et al.  Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[107]  Victor Zue,et al.  PEGASUS: A Spoken Language Interface for On-Line Air Travel Planning I , 1994, HLT.

[108]  Karen Livescu Analysis and modeling of non-native speech for automatic speech recognition , 1999 .

[109]  Mari Ostendorf,et al.  Parse scoring with prosodic information: an analysis/synthesis approach , 1993, Comput. Speech Lang..

[110]  Joseph Polifroni,et al.  Organization, communication, and control in the GALAXY-II conversational system , 1999, EUROSPEECH.

[111]  Yonghong Yan,et al.  Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[112]  Hélène Bonneau-Maynard,et al.  Evaluation of dialog strategies for a tourist information retrieval system , 1998, ICSLP.

[113]  Roberto Pieraccini,et al.  Stochastic representation of semantic structure for speech understanding , 1991, Speech Commun..

[114]  Victor Zue,et al.  PEGASUS: A spoken dialogue interface for on-line air travel planning , 1994, Speech Communication.

[115]  Victor Zue,et al.  A* word network search for continuous speech recognition , 1993, EUROSPEECH.