Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence

Based on the phenomena of mutual influence between participants of a face-to-face conversation, we propose a context-based prediction approach for modeling visual backchannels. Our goal is to create intelligent virtual listeners with the ability of providing backchannel feedbacks, enabling natural and fluid interactions. In our proposed approach, we first anticipate the speaker behaviors, and then use this anticipated visual context to obtain more accurate listener backchannel moments. We model the mutual influence between speaker and listener gestures using a latent variable sequential model. We compared our approach with state-of-the-art prediction models on a publicly available dataset and showed importance of modeling the mutual influence between the speaker and the listener.

[1]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[2]  Roman Grundkiewicz,et al.  Automatic Extraction of Polish Language Errors from Text Edition History , 2013, TSD.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[5]  Jean Carletta,et al.  A shallow model of backchannel continuers in spoken dialogue , 2003 .

[6]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[7]  Louis-Philippe Morency,et al.  Predicting Listener Backchannels: A Probabilistic Multimodal Approach , 2008, IVA.

[8]  Stacy Marsella,et al.  Natural Behavior of a Listening Agent , 2005, IVA.

[9]  Trevor Cohn,et al.  Logarithmic Opinion Pools for Conditional Random Fields , 2005, ACL.

[10]  Philip Tsui,et al.  Failure of rapport: why psychotherapeutic engagement fails in the treatment of Asian clients. , 1985, The American journal of orthopsychiatry.

[11]  D. Fuchs Examiner Familiarity Effects on Test Performance , 1987 .

[12]  Julia Hirschberg,et al.  Turn-taking and affirmative cue words in task-oriented dialogue , 2009 .

[13]  Louis-Philippe Morency,et al.  Latent Mixture of Discriminative Experts , 2013, IEEE Transactions on Multimedia.

[14]  Peter Robinson,et al.  When my robot smiles at me Enabling human-robot rapport via real-time head gesture mimicry , 2009 .

[15]  J. Cacioppo,et al.  Primitive emotional contagion. , 1992 .

[16]  Marina Davila Ross,et al.  Rapid facial mimicry in orangutan play , 2008, Biology Letters.

[17]  T. Kobayashi,et al.  A conversation robot using head gesture recognition as para-linguistic information , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[18]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[20]  Nigel Ward,et al.  Non-lexical conversational sounds in American English , 2006 .

[21]  Yoshiko Arimoto,et al.  Predicting Evidence of Understanding by Monitoring User's Task Manipulation in Multimodal Conversations , 2007, ACL.

[22]  Daniel Neiberg,et al.  Modelling Paralinguistic Conversational Interaction : Towards social awareness in spoken human-machine dialogue , 2012 .

[23]  Yukiko I. Nakano,et al.  Towards a Model of Face-to-Face Grounding , 2003, ACL.

[24]  Ning Wang,et al.  Creating Rapport with Virtual Agents , 2007, IVA.

[25]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[26]  M. Morris,et al.  Rapport in conflict resolution: Accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. , 2000 .

[27]  Ning Wang,et al.  Does the contingency of agents' nonverbal feedback affect users' social anxiety? , 2008, AAMAS.

[28]  Seiichi Nakagawa,et al.  A Spoken Dialog System for Chat-Like Conversations Considering Response Timing , 2007, TSD.

[29]  Nicu Sebe,et al.  Multimodal approaches for emotion recognition: a survey , 2005, IS&T/SPIE Electronic Imaging.

[30]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.