Toward Construction of Spoken Dialogue System that Evokes Users’ Spontaneous Backchannels

This paper addresses a first step toward a spoken dialogue system that evokes user's spontaneous backchannels. We construct an HMM-based dialogue-style text-to-speech (TTS) system that generates human-like cues that evoke users' backchannels. A spoken dialogue system for information navigation was implemented and the TTS was evaluated in terms of evoked user backchannels. We conducted user experiments and demonstrated that the user backchannels evoked by our TTS are more informative for the system in detecting users' feelings than those by conventional reading-style TTS.

[1]  Tetsunori Kobayashi,et al.  Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system , 2005, INTERSPEECH.

[2]  Satoshi Nakamura,et al.  Annotating communicative function and semantic content in dialogue act for construction of consulting dialogue systems , 2009, INTERSPEECH.

[3]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[4]  S. Maynard On back-channel behavior in Japanese and English casual conversation , 1987 .

[5]  B. Fletcher Nice to meet you , 2005 .

[6]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[7]  Kallirroi Georgila,et al.  Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection , 2010 .

[8]  Tatsuya Kawahara,et al.  Detection of feeling through back-channels in spoken dialogue , 2008, INTERSPEECH.

[9]  S. Itahashi,et al.  Insertion of interjectory response based on prosodic information , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[10]  Alexander I. Rudnicky,et al.  Towards Improving the Naturalness of Social Conversations with Dialogue Systems , 2010, SIGDIAL Conference.

[11]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[12]  Satoshi Nakamura,et al.  Dialogue strategy optimization to assist user's decision for spoken consulting dialogue systems , 2010, 2010 IEEE Spoken Language Technology Workshop.

[13]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[14]  Julia Hirschberg,et al.  Backchannel-inviting cues in task-oriented dialogue , 2009, INTERSPEECH.