Analysis and Predictive Modeling of Body Language Behavior in Dyadic Interactions From Multimodal Interlocutor Cues

During dyadic interactions, participants adjust their behavior and give feedback continuously in response to the behavior of their interlocutors and the interaction context. In this paper, we study how a participant in a dyadic interaction adapts his/her body language to the behavior of the interlocutor, given the interaction goals and context. We apply a variety of psychology-inspired body language features to describe body motion and posture. We first examine the coordination between the dyad's behavior for two interaction stances: friendly and conflictive. The analysis empirically reveals the dyad's behavior coordination, and helps identify informative interlocutor features with respect to the participant's target body language features. The coordination patterns between the dyad's behavior are found to depend on the interaction stances assumed. We apply a Gaussian-Mixture-Model-based (GMM) statistical mapping in combination with a Fisher kernel framework for automatically predicting the body language of an interacting participant from the speech and gesture behavior of an interlocutor. The experimental results show that the Fisher kernel-based approach outperforms methods using only the GMM-based mapping, and using the support vector regression, in terms of correlation coefficient and RMSE. These results suggest a significant level of predictability of body language behavior from interlocutor cues.

[1]  A. Murat Tekalp,et al.  Prosody-Driven Head-Gesture Animation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Sumit Basu,et al.  Learning Human Interactions w ith the Influence Model , 2001, NIPS 2001.

[3]  S. Levine,et al.  Gesture controllers , 2010, ACM Trans. Graph..

[4]  Zhigang Deng,et al.  Natural head motion synthesis driven by acoustic prosodic features , 2005, Comput. Animat. Virtual Worlds.

[5]  C. Pelachaud,et al.  Generating Listening Behaviour , 2011 .

[6]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[7]  Angeliki Metallinou,et al.  Analysis of interaction attitudes using data-driven hand gesture phrases , 2014, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Louis-Philippe Morency,et al.  A probabilistic multimodal approach for predicting listener backchannels , 2009, Autonomous Agents and Multi-Agent Systems.

[11]  Peter F. Driessen,et al.  Gesture-Based Affective Computing on Motion Capture Data , 2005, ACII.

[12]  F. Strack,et al.  "Mood contagion": the automatic transfer of mood between persons. , 2000, Journal of personality and social psychology.

[13]  Sharon Marie Carnicke,et al.  Stanislavsky in Focus: An Acting Master for the Twenty-First Century , 1998 .

[14]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  K. Scherer,et al.  The New Handbook of Methods in Nonverbal Behavior Research , 2008 .

[16]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Keiichi Tokuda,et al.  Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[19]  P. Ekman,et al.  BODY POSITION, FACIAL EXPRESSION, AND VERBAL BEHAVIOR DURING INTERVIEWS. , 1964, Journal of abnormal psychology.

[20]  Xiaojun Qi,et al.  Block-based long-term content-based image retrieval using multiple features , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Aryel Beck,et al.  Towards an Affect Space for robots to display emotional body language , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[22]  Athanasios Katsamanis,et al.  A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Panayiotis G. Georgiou,et al.  Data driven modeling of head motion towards analysis of behaviors in couple interactions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  M. Bartlett THE STATISTICAL SIGNIFICANCE OF CANONICAL CORRELATIONS , 1941 .

[26]  Julie Grèzes,et al.  Early Binding of Gaze, Gesture, and Emotion: Neural Time Course and Correlates , 2012, The Journal of Neuroscience.

[27]  Carlos Busso,et al.  Generating Human-Like Behaviors Using Joint, Speech-Driven Models for Conversational Agents , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Panayiotis G. Georgiou,et al.  Head motion synchrony and its correlation to affectivity in dyadic interactions , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[29]  Athanasios Katsamanis,et al.  Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information , 2013, Image Vis. Comput..

[30]  Angeliki Metallinou,et al.  Toward body language generation in dyadic interaction settings from interlocutor multimodal cues , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[32]  Athanasios Katsamanis,et al.  An Analysis of PCA-Based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions , 2011, INTERSPEECH.

[33]  Athanasios Katsamanis,et al.  Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions , 2014, Comput. Speech Lang..

[34]  Peter Robinson,et al.  Detecting Affect from Non-stylised Body Motions , 2007, ACII.

[35]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[36]  T. Chartrand,et al.  The chameleon effect: the perception-behavior link and social interaction. , 1999, Journal of personality and social psychology.

[37]  Shrikanth Narayanan,et al.  The USC Creative IT Database: A Multimodal Database of Theatrical Improvisation , 2010 .

[38]  Ginevra Castellano,et al.  Recognising Human Emotions from Body Movement and Gesture Dynamics , 2007, ACII.

[39]  Carlos Busso,et al.  Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions , 2009, INTERSPEECH.

[40]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[41]  J. Burgoon,et al.  Interpersonal Adaptation: Dyadic Interaction Patterns , 1995 .

[42]  Pedro J. Moreno,et al.  Using the Fisher kernel method for Web audio classification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[43]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[44]  Sergey Levine,et al.  Real-time prosody-driven synthesis of body language , 2009, ACM Trans. Graph..

[45]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[46]  Mohamed Chetouani,et al.  Interpersonal Synchrony: A Survey of Evaluation Methods across Disciplines , 2012, IEEE Transactions on Affective Computing.

[47]  Hatice Gunes,et al.  Bi-modal emotion recognition from expressive face and body gestures , 2007, J. Netw. Comput. Appl..

[48]  A. Kendon Movement coordination in social interaction: some examples described. , 1970, Acta psychologica.

[49]  Zhigang Deng,et al.  Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.