Linking emotions to behaviors through deep transfer learning

Human behavior refers to the way humans act and interact. Understanding human behavior is a cornerstone of observational practice, especially in psychotherapy. An important cue of behavior analysis is the dynamical changes of emotions during the conversation. Domain experts integrate emotional information in a highly nonlinear manner, thus, it is challenging to explicitly quantify the relationship between emotions and behaviors. In this work, we employ deep transfer learning to analyze their inferential capacity and contextual importance. We first train a network to quantify emotions from acoustic signals and then use information from the emotion recognition network as features for behavior recognition. We treat this emotion-related information as behavioral primitives and further train higher level layers towards behavior quantification. Through our analysis, we find that emotion-related information is an important cue for behavior recognition. Further, we investigate the importance of emotional-context in the expression of behavior by constraining (or not) the neural networks' contextual view of the data. This demonstrates that the sequence of emotions is critical in behavior expression. To achieve these frameworks we employ hybrid architectures of convolutional networks and recurrent networks to extract emotion-related behavior primitives and facilitate automatic behavior recognition from speech.

[1]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[2]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[3]  Shrikanth Narayanan,et al.  An analysis of observation length requirements in spoken language for machine understanding of human behaviors , 2019, ArXiv.

[4]  P. Ekman Are there basic emotions? , 1992, Psychological review.

[5]  Athanasios Katsamanis,et al.  Automatic classification of married couples' behavior using audio features , 2010, INTERSPEECH.

[6]  Che-Wei Huang,et al.  Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[8]  Che-Wei Huang,et al.  Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition , 2017, ArXiv.

[9]  Panayiotis G. Georgiou,et al.  Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment Interviews , 2017, INTERSPEECH.

[10]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[11]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[12]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[13]  Panayiotis G. Georgiou,et al.  Towards an Unsupervised Entrainment Distance in Conversational Speech using Deep Neural Networks , 2018, INTERSPEECH.

[14]  Y. X. Zou,et al.  An experimental study of speech emotion recognition based on deep convolutional neural networks , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[15]  Klaus R. Scherer,et al.  Oxford Companion to Emotion and the Affective Sciences , 2009 .

[16]  A. Pick,et al.  Infants' perception of dynamic affective expressions: do infants distinguish specific expressions? , 1999, Child development.

[17]  Björn W. Schuller,et al.  Speech emotion recognition , 2018, Commun. ACM.

[18]  Björn W. Schuller,et al.  Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Trans. Affect. Comput..

[19]  Carlos Busso,et al.  The expression and perception of emotions: comparing assessments of self versus others , 2008, INTERSPEECH.

[20]  Shrikanth Narayanan,et al.  Predicting couple therapy outcomes based on speech acoustic features , 2017, PloS one.

[21]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[23]  E. Waters,et al.  How much observational data is enough? An empirical test using marital interaction coding. , 2001, Behavior therapy.

[24]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[25]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Louis-Philippe Morency,et al.  Combating Human Trafficking with Multimodal Deep Models , 2017, ACL.

[27]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[28]  C. Heavey,et al.  The longitudinal impact of demand and withdrawal during marital conflict. , 1995, Journal of consulting and clinical psychology.

[29]  Paul E. Spector,et al.  An emotion-centered model of voluntary work behavior , 2002 .

[30]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[31]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[32]  Shrikanth S. Narayanan,et al.  A hierarchical static-dynamic framework for emotion classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Sanjeev Khudanpur,et al.  A pitch extraction algorithm tuned for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Rosalind W. Picard Affective computing: challenges , 2003, Int. J. Hum. Comput. Stud..

[35]  Haoqi Li,et al.  Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples' Therapy , 2016, INTERSPEECH.

[36]  Judith A. Hall,et al.  A thin slice perspective on the accuracy of first impressions , 2007 .

[37]  Panayiotis G. Georgiou,et al.  Behavioral signal processing for understanding (distressed) dyadic interactions: some recent developments , 2011, J-HGBU '11.

[38]  Panayiotis G. Georgiou,et al.  Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language , 2013, Proceedings of the IEEE.

[39]  Mariano Sigman,et al.  Emotional Intensity analysis in Bipolar subjects , 2016, ArXiv.

[40]  Mark E Feinberg,et al.  The Longitudinal Influence of Coparenting Conflict on Parental Negativity and Adolescent Maladjustment. , 2007, Journal of marriage and the family.

[41]  Marvin R. Goldfried,et al.  The Centrality of Emotion to Psychological Change , 2007 .

[42]  Emily Mower Provost,et al.  Using regional saliency for speech emotion recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  M. Cabanac What is emotion? , 2002, Behavioural Processes.

[44]  N. Anand,et al.  Convoluted Feelings Convolutional and recurrent nets for detecting emotion from audio data , 2015 .

[45]  D. Wegner,et al.  Psychology (2nd Edition) , 2011 .

[46]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[47]  Panayiotis G. Georgiou,et al.  Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models , 2016, INTERSPEECH.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[50]  Maja Pantic,et al.  Social Signal Processing , 2017 .

[51]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[52]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[53]  Louis-Philippe Morency,et al.  Adolescent Suicidal Risk Assessment in Clinician-Patient Interaction , 2017, IEEE Transactions on Affective Computing.

[54]  Sally M. Dunlop,et al.  Can You Feel It? Negative Emotion, Risk, and Narrative in Health Communication , 2008 .

[55]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[56]  Haoqi Li,et al.  Unsupervised latent behavior manifold learning from acoustic features: Audio2behavior , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  K. A. Jellinger Oxford Companion to Emotion and the Affective Sciences , 2010 .

[58]  Hagen Soltau,et al.  Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.

[59]  Tieniu Tan,et al.  Affective Computing: A Review , 2005, ACII.

[60]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[61]  Samuel D. Lustgarten Emerging ethical threats to client privacy in cloud communication and data storage. , 2015 .

[62]  Erik Cambria,et al.  Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[63]  Panayiotis G. Georgiou,et al.  Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features , 2016, AVEC@ACM Multimedia.

[64]  David C. Atkins,et al.  Traditional versus integrative behavioral couple therapy for significantly and chronically distressed married couples. , 2004, Journal of consulting and clinical psychology.

[65]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[66]  Roland Göcke,et al.  An Investigation of Emotional Speech in Depression Classification , 2016, INTERSPEECH.

[67]  P. Ekman An argument for basic emotions , 1992 .

[68]  Richard E. Heyman,et al.  Rapid marital interaction coding system (RMICS) , 2004 .

[69]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[70]  Panayiotis G. Georgiou,et al.  "That's Aggravating, Very Aggravating": Is It Possible to Classify Behaviors in Couple Interactions Using Automatically Derived Lexical Features? , 2011, ACII.

[71]  Masahiko Haruno,et al.  Brain response patterns to economic inequity predict present and future depression indices , 2017, Nature Human Behaviour.

[72]  Jinkyu Lee,et al.  High-level feature representation using recurrent neural network for speech emotion recognition , 2015, INTERSPEECH.

[73]  Athanasios Katsamanis,et al.  Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features , 2013, Speech Commun..

[74]  Emily Mower Provost,et al.  The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild , 2018, INTERSPEECH.

[75]  Russell Beale,et al.  Affect and Emotion in Human-Computer Interaction, From Theory to Applications , 2008, Affect and Emotion in Human-Computer Interaction.

[76]  K. Vohs,et al.  Does emotion cause behavior (apart from making people do stupid, destructive things)? , 2010 .

[77]  Erika Hoff,et al.  Language Development at an Early Age: Learning Mechanisms and Outcomes from Birth to Five Years , 2009 .

[78]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[79]  Panayiotis G. Georgiou,et al.  SailAlign: Robust long speech-text alignment , 2011 .

[80]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[81]  K. Vohs,et al.  How Emotion Shapes Behavior: Feedback, Anticipation, and Reflection, Rather Than Direct Causation , 2007, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[82]  Shao-Yen Tseng,et al.  Unsupervised online multitask learning of behavioral sentence embeddings , 2018, PeerJ Comput. Sci..

[83]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[84]  Bryan Hartzler,et al.  Agency context and tailored training in technology transfer: a pilot evaluation of motivational interviewing training for community counselors. , 2009, Journal of substance abuse treatment.

[85]  Emily Mower Provost,et al.  Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.