Perceptual evaluation of backchannel strategies for artificial listeners

Artificial listeners are virtual agents that can listen attentively to a human speaker in a dialog. In this paper, we present two experiments where we investigate the perception of rule-based backchannel strategies for artificial listeners. In both, we collect subjective judgements of humans who observe a video of a speaker together with a corresponding animation of an artificial listener. In the first experiment, we evaluate six rule-based strategies that differ in the types of features (e.g. prosody, gaze) they consider. The ratings are given at the level of a speech turn and can be considered a measure for how human-like the generated listening behavior is perceived. In the second experiment, we systematically investigate the effect of the quantity, type and timing of backchannels within the discourse of the speaker. Additionally, we asked human observers to press a button whenever they thought a generated backchannel occurrence was inappropriate. Both experiments together give insights in the factors, both from an observation and generation point-of-view, that influence the perception of backchannel strategies for artificial listeners.

[1]  Jean Carletta,et al.  A shallow model of backchannel continuers in spoken dialogue , 2003 .

[2]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[3]  Stacy Marsella,et al.  Virtual Rapport , 2006, IVA.

[4]  Dirk Heylen,et al.  Iterative perceptual learning for social behavior synthesis , 2013, Journal on Multimodal User Interfaces.

[5]  Julia Hirschberg,et al.  Backchannel-inviting cues in task-oriented dialogue , 2009, INTERSPEECH.

[6]  J. Bavelas,et al.  Listener Responses as a Collaborative Process: The Role of Gaze , 2002 .

[7]  Dennis Reidsma,et al.  Elckerlyc - A BML Realizer for continuous, multimodal interaction with a Virtual Human , 2009 .

[8]  Dirk Heylen,et al.  Backchannels: Quantity, Type and Timing Matters , 2011, IVA.

[9]  Elisabetta Bevacqua,et al.  Multimodal Backchannels for Embodied Conversational Agents , 2010, IVA.

[10]  S. Itahashi,et al.  Insertion of interjectory response based on prosodic information , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[11]  Louis-Philippe Morency,et al.  Virtual Rapport 2.0 , 2011, IVA.

[12]  Stacy Marsella,et al.  Natural Behavior of a Listening Agent , 2005, IVA.

[13]  Dirk Heylen,et al.  Online Behavior Evaluation with the Switching Wizard of Oz , 2012, IVA.

[14]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[15]  S. Duncan,et al.  On the structure of speaker–auditor interaction during speaking turns , 1974, Language in Society.

[16]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[17]  Seiichi Nakagawa,et al.  Response Timing Detection Using Prosodic and Linguistic Information for Human-friendly Spoken Dialog Systems (論文特集:人間と共生する情報システム) , 2005 .

[18]  Dirk Heylen,et al.  A rule-based backchannel prediction model using pitch and pause information , 2010, INTERSPEECH.

[19]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[20]  V. Yngve On getting a word in edgewise , 1970 .

[21]  A. Dittmann,et al.  Relationship between vocalizations and head nods as listener responses. , 1968, Journal of personality and social psychology.

[22]  L. J. Brunner,et al.  Smiles can be back channels. , 1979 .

[23]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[24]  Louis-Philippe Morency,et al.  Learning Backchannel Prediction Model from Parasocial Consensus Sampling: A Subjective Evaluation , 2010, IVA.

[25]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[26]  J. Bavelas,et al.  Listeners as co-narrators. , 2000, Journal of personality and social psychology.

[27]  Philippe Blache,et al.  Backchannels revisited from a multimodal perspective , 2007, AVSP.

[28]  Kristinn R. Thórisson,et al.  Fluid Semantic Back-Channel Feedback in Dialogue: Challenges and Progress , 2007, IVA.

[29]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[30]  Louis-Philippe Morency,et al.  A probabilistic multimodal approach for predicting listener backchannels , 2009, Autonomous Agents and Multi-Agent Systems.

[31]  Dirk Heylen,et al.  Learning and evaluating response prediction models using parallel listener consensus , 2010, ICMI-MLMI '10.

[32]  C. Pelachaud,et al.  Generating Listening Behaviour , 2011 .

[33]  Dirk Heylen,et al.  Backchannel Strategies for Artificial Listeners , 2010, IVA.

[34]  Björn Granström,et al.  Multimodal feedback cues in human-machine interactions , 2002, Speech Prosody 2002.

[35]  A. Dittmann,et al.  The phonemic clause as a unit of speech decoding. , 1967, Journal of personality and social psychology.

[36]  Dirk Heylen,et al.  A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs , 2011, INTERSPEECH.

[37]  Yasuharu Den,et al.  Prosody-based detection of the context of backchannel responses , 1998, ICSLP.