Response Timing Detection Using Prosodic and Linguistic Information for Human-friendly Spoken Dialog Systems (論文特集:人間と共生する情報システム)

Summary If a dialog system can respond to the user as reasonably as a human, the interaction will become smoother. Timing of the response such as back-channels and turn-taking plays an important role in such a smooth dialog as in human-human interaction. We developed a response timing generator for such a dialog system. This generator uses a decision tree to detect the timing based on the features coming from some prosodic and linguistic information. The timing generator decides the action of the system at every 100 ms during the user's pause. In this paper, we describe a robust spoken dialog system using the timing generator. Subjective evaluation proved that almost all of the subjects experienced a friendly feeling from the system.

[1]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[2]  Atsuhiko Kai,et al.  A frame-synchronous continuous speech recognition algorithm using a top-down parsing of context-free grammar , 1992, ICSLP.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Mikio Nakano,et al.  Learning decision trees to determine turn-taking by spoken dialogue systems , 2002, INTERSPEECH.

[5]  Mikio Nakano,et al.  Effects of system barge-in responses on user impressions , 1999, EUROSPEECH.

[6]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[7]  Julia Hirschberg,et al.  Communication and prosody: Functional aspects of prosody , 2002, Speech Commun..

[8]  S. Itahashi,et al.  Insertion of interjectory response based on prosodic information , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[9]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[10]  Yasuharu Den,et al.  Prosody-based detection of the context of backchannel responses , 1998, ICSLP.

[11]  Seiichi Nakagawa,et al.  Timing Detection for Realtime Dialog Systems Using Prosodic and Linguistic Information , 2004 .

[12]  Keikichi Hirose,et al.  A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Marc Swerts,et al.  Prosodic cues to discourse boundaries in experimental dialouges , 1994, Speech Communication.

[14]  E. Russell Ritenour,et al.  Evaluating spoken dialog systems for telecommunication services , 1997, EUROSPEECH.

[15]  Seiichi Nakagawa,et al.  Generation of natural response timing using decision tree based on prosodic and linguistic information , 2003, INTERSPEECH.