Incremental Multimodal Feedback for Conversational Agents

Just like humans, conversational computer systems should not listen silently to their input and then respond. Instead, they should enforce the speaker-listener link by attending actively and giving feedback on an utterance while perceiving it. Most existing systems produce direct feedback responses to decisive (e.g. prosodic) cues. We present a framework that conceives of feedback as a more complex system, resulting from the interplay of conventionalized responses to eliciting speaker events and the multimodal behavior that signals how internal states of the listener evolve. A model for producing such incremental feedback, based on multi-layered processes for perceiving, understanding, and evaluating input, is described.

[1]  Stefan Kopp,et al.  A Conversational Agent as Museum Guide - Design and Evaluation of a Real-World Application , 2005, IVA.

[2]  J. Allwood,et al.  A study of gestural feedback expressions , 2006 .

[3]  Stefan Kopp,et al.  Modeling Embodied Feedback with Virtual Humans , 2006, ZiF Workshop.

[4]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents , 2004, Comput. Animat. Virtual Worlds.

[5]  Seiichi Nakagawa,et al.  Timing Detection for Realtime Dialog Systems Using Prosodic and Linguistic Information , 2004 .

[6]  Joakim Nivre,et al.  On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..

[7]  J. Cassell,et al.  Communicative humanoids: a computational model of psychosocial dialogue skills , 1996 .

[8]  V. Yngve On getting a word in edgewise , 1970 .

[9]  Stefan Kopp,et al.  Synthesis of prosodic attitudinal variants in German backchannel ja , 2007, INTERSPEECH.

[10]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[11]  Shinya Fujie,et al.  A Conversation Robot with Back-channel Feedback Function based on Linguistic and Nonlinguistic Information , 2004 .

[12]  Jean Carletta,et al.  A shallow model of backchannel continuers in spoken dialogue , 2003 .

[13]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[14]  Stacy Marsella,et al.  Virtual Rapport , 2006, IVA.

[15]  Heather H. Mitchell,et al.  AutoTutor: A tutor with dialogue in natural language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[16]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents: Research Articles , 2004 .

[17]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[18]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.