Spoken Language Dialogue Models

Spoken language interactive systems range from speech-enabled command interfaces to dialogue systems which conduct spoken conversations with the user. In the first case, spoken language is used as an alternative input and output modality, so that the commands, which the user could type or select from the menu, may also be uttered. The system responses can also be given as spoken utterances, instead of written language or drawings on the screen, so the whole interaction can be conducted in speech. Spoken dialogue systems, however, are built on models concerning spoken conversations between participants so as to allow flexible interaction capabilities. Although interactions are limited concerning topics, turn-taking principles and conversational strategies, the systems aim at human–computer interaction that would support natural interaction which enables the user to interact with the system in an intuitive manner. Moreover, trying to combine insights of the processes that underlie typical human interactions, spoken dialogue modelling also seeks to advance our knowledge and understanding of the principles that govern communicative situations in general.

[1]  Tanja Schultz,et al.  Janus: Towards Multilingual Spoken Language Translation , 1995 .

[2]  Julia Hirschberg,et al.  Acoustic indicators of topic segmentation , 1998, ICSLP.

[3]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[4]  Julia Hirschberg,et al.  Empirical Studies on the Disambiguation of Cue Phrases , 1993, Comput. Linguistics.

[5]  Michael F. McTear,et al.  Book Review , 2005, Computational Linguistics.

[6]  James R. Glass,et al.  Guest editorial introduction to the special issue on language modeling and dialogue systems , 2000, IEEE Trans. Speech Audio Process..

[7]  Norbert Reithinger,et al.  Utilizing Statistical Dialogue Act Processing in Verbrnobil , 1995, ACL.

[8]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[9]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[10]  Pontus Johansson User Modeling in Dialogue Systems , 2002 .

[11]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[12]  Harry Bunt,et al.  Dynamic Interpretation and Dialogue Theory , 2000 .

[13]  R. Power The organisation of purposeful dialogues , 1979 .

[14]  A. Kendon Gesture: Visible Action as Utterance , 2004 .

[15]  C. Sidner,et al.  Plans for Discourse , 1988 .

[16]  Graeme Hirst,et al.  The Repair of Speech Act Misunderstandings by Abductive Inference , 1995, CL.

[17]  Amanda Stent,et al.  The CommandTalk Spoken Dialogue System , 1999, ACL.

[18]  Eduard Hovy,et al.  Generating Natural Language Under Pragmatic Constraints , 1988 .

[19]  Sun-Yuan Kung,et al.  Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning , 2007, Comput. Speech Lang..

[20]  D. Bouwhuis,et al.  The Structure of Multimodal Dialogue , 1989 .

[21]  Hua Ai,et al.  Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users , 2007, SIGDIAL.

[22]  Terry Winograd,et al.  Understanding natural language , 1974 .

[23]  Anton Leuski,et al.  A Virtual Human for Tactical Questioning , 2007, SIGDIAL 2007.

[24]  H. C. Bunt,et al.  DIT: Dynamic interpretation in text and dialogue , 1990 .

[25]  Douglas E. Appelt,et al.  Planning English Sentences , 1988, Cogn. Sci..

[26]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[27]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[28]  David R. Traum,et al.  Book Reviews: Spoken Natural Language Dialogue Systems: A Practical Approach , 1996, CL.

[29]  E. Maier,et al.  Dialogue Acts in VERBMOBIL , 1995 .

[30]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[31]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[32]  Jakob Nielsen,et al.  Heuristic Evaluation of Prototypes (individual) , 2022 .

[33]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[34]  J. Searle Expression and Meaning: Studies in the Theory of Speech Acts , 1979 .

[35]  Sebastian Möller A new Taxonomy for the Quality of Telephone Services Based on Spoken Dialogue Systems , 2002, SIGDIAL Workshop.

[36]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[37]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[38]  Herbert A. Simon,et al.  Computer Science as Empirical Inquiry , 2011 .

[39]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[40]  David Suendermann,et al.  Challenges in Speech Synthesis , 2010 .

[41]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[42]  Mikio Nakano,et al.  Effects of system barge-in responses on user impressions , 1999, EUROSPEECH.

[43]  Kristiina Jokinen,et al.  Constructive Dialogue Modelling - Speech Interaction and Rational Agents , 2009, Wiley series in agent technology.

[44]  J. Galliers A theoretical framework for computer models of cooperative dialogue, acknowledging multiagent conflict , 1988 .

[45]  Mauri Kaipainen,et al.  Self-Organizing Dialogue Management , 2001 .

[46]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[47]  Wolfgang Wahlster,et al.  Over-Answering Yes-No Questions: Extended Responses in a NL Interface to a Vision System , 1983, IJCAI.

[48]  Klaus Ries,et al.  HMM and neural network based speech act detection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[49]  J. Austin How to do things with words , 1962 .

[50]  Marilyn A. Walker,et al.  Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email , 1998, COLING-ACL.

[51]  Daniel Jurafsky,et al.  Lexical, Prosodic, and Syntactic Cues for Dialog Acts , 1998 .

[52]  Stefan Wermter,et al.  SCREEN: learning a flat syntactic and semantic spoken language analysis using artificial neural networks , 1997 .

[53]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search , 1976, CACM.

[54]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[55]  Marilyn A. Walker,et al.  Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..

[56]  Costanza Navarretta,et al.  The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena , 2007, Lang. Resour. Evaluation.

[57]  James F. Allen,et al.  A Plan Recognition Model for Subdialogues in Conversations , 1987, Cogn. Sci..

[58]  Elizabeth Shriberg,et al.  Subject-Based Evaluation Measures for Interactive Spoken Language Systems , 1992, HLT.

[59]  Masaaki Nagata,et al.  First steps towards statistical modeling of dialogue to predict the speech act type of the next utterance , 1994, Speech Communication.

[60]  Rachel Reichman,et al.  Getting computers to talk like you and me , 1985 .

[61]  David R. Traum,et al.  Discourse Obligations in Dialogue Processing , 1994, ACL.

[62]  Philippe Bretier,et al.  ARTIMIS: Natural Dialogue Meets Rational Agency , 1997, IJCAI.

[63]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[64]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[65]  Jens Allwood,et al.  A CRITICAL LOOK AT SPEECH ACT THEORY , 2001 .

[66]  Candace L. Sidner,et al.  Using plan recognition in human-computer collaboration , 1999 .

[67]  Hector J. Levesque,et al.  Rational interaction as the basis for communication , 2003 .

[68]  Ted Briscoe,et al.  32nd Annual Meeting of the Association for Computational Linguistics, 27-30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings , 1994, ACL.

[69]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[70]  David N. Chin KNOME: Modeling What the User Knows in UC , 1989 .

[71]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[72]  Cécile Paris,et al.  Tailoring Object Descriptions to a User's Level of Expertise , 1988, Comput. Linguistics.

[73]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[74]  Risto Miikkulainen,et al.  Subsymbolic natural language processing - an integrated model of scripts, lexicon, and memory , 1993, Neural network modeling and connectionism.

[75]  C. Raymond Perrault,et al.  Analyzing Intention in Utterances , 1986, Artif. Intell..

[76]  Michael D Wallace,et al.  Approaches to Interface Design , 1993, Interact. Comput..

[77]  Harry Bunt,et al.  A Pragmatics-based Language Understanding System , 1991 .

[78]  Bonnie Webber,et al.  Preventing False Inferences , 1984, Annual Meeting of the Association for Computational Linguistics.

[79]  Chung Hee Hwang,et al.  The TRAINS project: a case study in building a conversational planning agent , 1994, J. Exp. Theor. Artif. Intell..

[80]  Rolf Carlson,et al.  The dialog component in the Waxholm system , 2007 .

[81]  Ronnie W. Smith,et al.  An evaluation of strategies for selectively verifying utterance meanings in spoken natural language dialog , 1998, Int. J. Hum. Comput. Stud..

[82]  C. Raymond Perrault,et al.  Elements of a Plan-Based Theory of Speech Acts , 1979, Cogn. Sci..

[83]  Wayne H. Ward,et al.  High level knowledge sources in usable speech recognition systems , 1989, CACM.

[84]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[85]  Adam Jaworski,et al.  The Discourse Reader , 2006 .

[86]  Philip R. Cohen,et al.  Persistence, Intention, and Commitment , 2003 .

[87]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[88]  M. Weiser The Computer for the Twenty-First Century , 1991 .

[89]  Marilyn A. Walker,et al.  Evaluating competing agent strategies for a voice email agent , 1997, EUROSPEECH.

[90]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[91]  Norman M. Fraser,et al.  Dialogue Management for Telephone Information Systems , 1992, ANLP.

[92]  Kristiina Jokinen,et al.  Goal Formulation based on Communicative Principles , 1996, COLING.

[93]  C. Goodwin Conversational Organization: Interaction Between Speakers and Hearers , 1981 .

[94]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[95]  Marilyn A. Walker,et al.  Automatic Optimization of Dialogue Management , 2000, COLING.

[96]  Päivi Majaranta,et al.  Twenty years of eye typing: systems and design issues , 2002, ETRA.

[97]  James F. Allen,et al.  An architecture for a generic dialogue shell , 2000, Natural Language Engineering.

[98]  Jakob Nielsen,et al.  Usability inspection methods , 1994, CHI 95 Conference Companion.

[99]  Michael Zock,et al.  Advances in natural language generation : an interdisciplinary perspective , 1988 .

[100]  Emiel Krahmer,et al.  Problem spotting in human-machine interaction , 1999, EUROSPEECH.

[101]  Edmund H. Durfee,et al.  Elements of a Utilitarian Theory of Knowledge and Action , 1993, IJCAI.

[102]  Kathleen F. McCoy Reasoning on a Highlighted User Model to Respond to Misconceptions , 1988, Comput. Linguistics.

[103]  Michael Kearns,et al.  CobotDS: a spoken dialogue system for chat , 2002, AAAI/IAAI.

[104]  Johanna D. Moore,et al.  A Reactive Approach to Explanation , 1989, IJCAI.

[105]  Roberto Pieraccini,et al.  Automating spoken dialogue management design using machine learning: An industry perspective , 2008, Speech Commun..

[106]  Shrikanth Narayanan,et al.  Hassan: A Virtual Human for Tactical Questioning , 2007, SIGdial.

[107]  Susan Weinschenk,et al.  Designing effective speech interfaces , 2000 .

[108]  Mikio Nakano,et al.  WIT: A Toolkit for Building Robust and Real-Time Spoken Dialogu Systems , 2000, SIGDIAL Workshop.

[109]  David R. Traum,et al.  Cooperation, dialogue and ethics , 2000, Int. J. Hum. Comput. Stud..

[110]  Jennifer Chu-Carroll,et al.  An Evidential Model for Tracking Initiative in Collaborative Dialogue Interactions , 1998, User Modeling and User-Adapted Interaction.

[111]  Jens Allwood,et al.  OBLIGATIONS AND OPTIONS IN DIALOGUE , 1994 .

[112]  Kristiina Jokinen,et al.  User expectations and real experience on a multimodal interactive system , 2006, INTERSPEECH.

[113]  Hector J. Levesque,et al.  On Acting Together , 1990, AAAI.

[114]  Hitoshi Iida,et al.  A Japanese-to-English speech translation system: ATR-MATRIX , 1998, ICSLP.

[115]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[116]  Sandra Carberry,et al.  Plan Recognition in Natural Language Dialogue , 1990 .

[117]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[118]  Wolfgang Hoeppner,et al.  Review of Generating natural language under pragmantic constraints by Edward H. Hovy. Lawrence Erlbaum Associates 1988. , 1990 .

[119]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[120]  Emiel Krahmer,et al.  Machine Learning for Shallow Interpretation of User Utterances in Spoken Dialogue Systems , 2003 .

[121]  H. H. Clark,et al.  Referring as a collaborative process , 1986, Cognition.

[122]  Stanley Peters,et al.  The WITAS multi-modal dialogue system I , 2001, INTERSPEECH.

[123]  Barbara J. Grosz,et al.  The representation and use of focus in dialogue understanding. , 1977 .

[124]  Philip R. Cohen,et al.  Intentions in Communication. , 1992 .

[125]  Geert-Jan M. Kruijff,et al.  Talking robots with Lego MindStorms , 2004, COLING.

[126]  Varol Akman Book Review--Ronald Cole (editor-in-chief), Joseph Mariani, Hans Uszkoreit, Annie Zaenen, and Victor Zue, eds., Survey of the State of the Art in Human Language Technology , 1999 .

[127]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[128]  Johan Bos,et al.  Meaningful Conversation with a Mobile Robot , 2003, EACL.

[129]  Hans Uszkoreit,et al.  Introduction to this Special Issue , 2000, Natural Language Engineering.

[130]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[131]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[132]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[133]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[134]  Anton Nijholt,et al.  Development of Multimodal Interfaces: Active Listening and Synchrony, Second COST 2102 International Training School, Dublin, Ireland, March 23-27, 2009, Revised Selected Papers , 2010, COST 2102 Training School.

[135]  Ken Samuel,et al.  Dialogue Act Tagging with Transformation-Based Learning , 1998, ACL.

[136]  Nicole Yankelovich,et al.  How do users know what to say? , 1996, INTR.

[137]  Konrad Scheffler,et al.  Probabilistic simulation of human-machine dialogues , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[138]  Curry I. Guinn,et al.  Mechanisms for Mixed-Initiative Human-Computer Collaborative Discourse , 1996, ACL.

[139]  H. Grice Logic and conversation , 1975 .

[140]  Mikio Nakano,et al.  Understanding Unsegmented User Utterances in Real-Time Spoken Dialogue Systems , 1999, ACL.

[141]  Donald A. Norman,et al.  User Centered System Design: New Perspectives on Human-Computer Interaction , 1988 .

[142]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[143]  C. Raymond Perrault,et al.  Elements of a Plan-Based Theory of Speech Acts , 1979, Cogn. Sci..

[144]  優 喜連川,et al.  The fifth generation computer : the Japanese challenge , 1985 .