论文信息 - Optimizing the turn-taking behavior of task-oriented spoken dialog systems - 字舞流文

Optimizing the turn-taking behavior of task-oriented spoken dialog systems

Even as progress in speech technologies and task and dialog modeling has allowed the development of advanced spoken dialog systems, the low-level interaction behavior of those systems often remains rigid and inefficient. Based on an analysis of human-human and human-computer turn-taking in naturally occurring task-oriented dialogs, we define a set of features that can be automatically extracted and show that they can be used to inform efficient end-of-turn detection. We then frame turn-taking as decision making under uncertainty and describe the Finite-State Turn-Taking Machine (FSTTM), a decision-theoretic model that combines data-driven machine learning methods and a cost structure derived from Conversation Analysis to control the turn-taking behavior of dialog systems. Evaluation results on CMU Let's Go, a publicly deployed bus information system, confirm that the FSTTM significantly improves the responsiveness of the system compared to a standard threshold-based approach, as well as previous data-driven methods.

Maxine Eskénazi | Antoine Raux | Antoine Raux | M. Eskénazi

[1] Alexander I. Rudnicky,et al. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2] Andreas Stolcke,et al. A prosody-based approach to end-of-utterance detection that does not require speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3] Maxine Eskénazi,et al. Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[4] Alexander I. Rudnicky,et al. Integrating Multiple Knowledge Sources for Utterance-Level Confidence Annotation in the CMU Communicator Spoken Dialog System , 2002 .

[5] Takayuki Kanda,et al. Footing in human-robot conversations: How robots might shape participant roles using gaze cues , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6] Ronald Rosenfeld,et al. Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7] Maxine Eskénazi,et al. A Finite-State Turn-Taking Model for Spoken Dialog Systems , 2009, NAACL.

[8] S. Thompson,et al. The conversational use of reactive tokens in English, Japanese, and Mandarin , 1996 .

[9] Fredrik Kronlid,et al. Turn Taking for Artificial Conversational Agents , 2006, CIA.

[10] Justine Cassell,et al. BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[11] David Schlangen,et al. Towards Incremental End-of-Utterance Detection in Dialogue Systems , 2008, COLING.

[12] Kristinn R. Thórisson,et al. Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action , 2002 .

[13] Alexander I. Rudnicky,et al. Ravenclaw: dialog management using hierarchical task decomposition and an expectation agenda , 2003, INTERSPEECH.

[14] EskenaziMaxine,et al. Optimizing the turn-taking behavior of task-oriented spoken dialog systems , 2012 .

[15] Stefan Kopp,et al. Middleware for Incremental Processing in Conversational Agents , 2010, SIGDIAL Conference.

[16] Jan-Peter de Holger N. J. Ruiter,et al. Projecting the End of a Speaker's Turn: A Cognitive Cornerstone of Conversation , 2006 .

[17] Björn Granström,et al. Multimodality in Language and Speech Systems , 2002 .

[18] A. Ichikawa,et al. An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[19] Sandra A. Thompson,et al. Interaction and grammar: Frontmatter , 1996 .

[20] Alexander I. Rudnicky,et al. Implicitly-supervised Learning in Spoken Language Interfaces: an Application to the Confidence Annotation Problem , 2007, SIGDIAL.

[21] E. Schegloff,et al. A simplest systematics for the organization of turn-taking for conversation , 1974 .

[22] Louis-Philippe Morency,et al. A multimodal end-of-turn prediction model: learning from parasocial consensus sampling , 2011, AAMAS.

[23] Maxine Eskénazi,et al. Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System , 2008, SIGDIAL Workshop.

[24] Mattias Heldner,et al. A single-port non-parametric model of turn-taking in multi-party conversation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25] Kornel Laskowski,et al. Modeling Norms of Turn-Taking in Multi-Party Conversation , 2010, ACL.

[26] S. Duncan,et al. Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[27] David G. Novick,et al. Root causes of lost time and user stress in a simple dialog system , 2005, INTERSPEECH.

[28] David DeVault,et al. Can I Finish? Learning When to Respond to Incremental Interpretation Results in Interactive Dialogue , 2009, SIGDIAL Conference.

[29] Maxine Eskénazi,et al. Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.

[30] Robert Porzel,et al. The Tao of CHI: Towards Effective Human-Computer Interaction , 2004, NAACL.

[31] Eric Horvitz,et al. Conversation as Action Under Uncertainty , 2000, UAI.

[32] Maxine Eskénazi,et al. LET's GO: improving spoken dialog systems for the elderly and non-natives , 2003, INTERSPEECH.

[33] Louis-Philippe Morency,et al. A probabilistic multimodal approach for predicting listener backchannels , 2009, Autonomous Agents and Multi-Agent Systems.

[34] R. J. J. H. van Son,et al. Timing of experimentally elicited minimal responses as quantitative evidence for the use of intonation in projecting TRPs , 2005, INTERSPEECH.

[35] Eric Horvitz,et al. Multiparty Turn Taking in Situated Dialog: Study, Lessons, and Directions , 2011, SIGDIAL Conference.

[36] Alexander I. Rudnicky,et al. The RavenClaw dialog management framework: Architecture and systems , 2009, Comput. Speech Lang..

[37] J. Oberlander,et al. Using Facial Feedback to Enhance Turn-Taking in a Multimodal Dialogue System , 2005 .

[38] Maxine Eskénazi,et al. Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results , 2011, SIGDIAL Conference.

[39] Hiroko Furo. Turn-taking in English and Japanese: Projectability in Grammar, Intonation, and Semantics , 2001 .

[40] Matthew Christopher Bull,et al. Timing and coordination of turn-taking , 1998 .

[41] G. Beattie. Turn-taking and interruption in political interviews: Margaret Thatcher and Jim Callaghan compared and contrasted , 1982 .

[42] Mikio Nakano,et al. Learning decision trees to determine turn-taking by spoken dialogue systems , 2002, INTERSPEECH.

[43] Gwyneth Doherty-Sneddon,et al. The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[44] Antoine Raux. Flexible Turn-Taking for Spoken Dialogue Systems , 2006 .

[45] Andrea Lockerd Thomaz,et al. Simon plays Simon says: The timing of turn-taking in an imitation game , 2011, 2011 RO-MAN.

[46] Julia Hirschberg,et al. Turn-taking cues in task-oriented dialogue , 2011, Comput. Speech Lang..

[47] S. Feldstein,et al. Rhythms of dialogue , 1970 .

[48] Wayne H. Ward,et al. Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[49] I Hutchby,et al. Interaction and grammar. , 1998 .

[50] E. Schegloff. Overlapping talk and the organization of turn-taking for conversation , 2000, Language in Society.

[51] Matthew P. Aylett,et al. An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue , 1998, ICSLP.

[52] Cecilia E. Ford,et al. Interaction and grammar: Interactional units in conversation: syntactic, intonational, and pragmatic resources for the management of turns , 1996 .

[53] Bengt Oreström. Turn-taking in English conversation , 1983 .

[54] Seiichi Nakagawa,et al. Timing Detection for Realtime Dialog Systems Using Prosodic and Linguistic Information , 2004 .

[55] Maxine Eskénazi,et al. A multi-layer architecture for semi-synchronous event-driven dialogue management , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[56] Olac Fuentes,et al. Dialog prediction for a general model of turn-taking , 2010, INTERSPEECH.

[57] Mattias Heldner,et al. /nailon/ - Software for Online Analysis of Prosody , 2006, INTERSPEECH.