A Finite-State Turn-Taking Model for Spoken Dialog Systems

This paper introduces the Finite-State Turn-Taking Machine (FSTTM), a new model to control the turn-taking behavior of conversational agents. Based on a non-deterministic finite-state machine, the FSTTM uses a cost matrix and decision theoretic principles to select a turn-taking action at any time. We show how the model can be applied to the problem of end-of-turn detection. Evaluation results on a deployed spoken dialog system show that the FSTTM provides significantly higher responsiveness than previous approaches.

[1]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[2]  David G. Novick,et al.  Root causes of lost time and user stress in a simple dialog system , 2005, INTERSPEECH.

[3]  Robert Porzel,et al.  The Tao of CHI: Towards Effective Human-Computer Interaction , 2004, NAACL.

[4]  Björn Granström,et al.  Multimodality in Language and Speech Systems , 2002 .

[5]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[6]  R. J. J. H. van Son,et al.  Timing of experimentally elicited minimal responses as quantitative evidence for the use of intonation in projecting TRPs , 2005, INTERSPEECH.

[7]  Fredrik Kronlid,et al.  Turn Taking for Artificial Conversational Agents , 2006, CIA.

[8]  David Harel,et al.  Statecharts: A Visual Formalism for Complex Systems , 1987, Sci. Comput. Program..

[9]  Paul T. Brady,et al.  A model for generating on-off speech patterns in two-way conversation , 1969 .

[10]  E. Schegloff Overlapping talk and the organization of turn-taking for conversation , 2000, Language in Society.

[11]  David R. Traum,et al.  Discourse Obligations in Dialogue Processing , 1994, ACL.

[12]  B. Granström,et al.  NATURAL TURN-TAKING NEEDS NO MANUAL : COMPUTATIONAL THEORY AND MODEL , FROM PERCEPTION TO ACTION , 2002 .

[13]  Mikio Nakano,et al.  Learning decision trees to determine turn-taking by spoken dialogue systems , 2002, INTERSPEECH.

[14]  Antoine Raux Flexible Turn-Taking for Spoken Dialogue Systems , 2006 .

[15]  Andreas Stolcke,et al.  A prosody-based approach to end-of-utterance detection that does not require speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Kristinn R. Thórisson,et al.  Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action , 2002 .

[17]  Bengt Oreström Turn-taking in English conversation , 1983 .

[18]  Seiichi Nakagawa,et al.  Timing Detection for Realtime Dialog Systems Using Prosodic and Linguistic Information , 2004 .

[19]  Maxine Eskénazi,et al.  Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System , 2008, SIGDIAL Workshop.

[20]  S. Feldstein,et al.  Rhythms of dialogue , 1970 .

[21]  Hao Yan,et al.  More than just a pretty face: conversational protocols and the affordances of embodiment , 2001, Knowl. Based Syst..

[22]  Helsingin Yliopisto Prosodic features associated with the distribution of turns in Finnish informal dialogues , 2002 .