"You made me do it": Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information

One of the goals of behavioral signal processing is the automatic prediction of relevant high-level human behaviors from complex, realistic interactions. In this work, we analyze dyadic discussions of married couples and try to classify extreme instances (low/high) of blame expressed from one spouse to another. Since blame can be conveyed through various communicative channels (e.g., speech, language, gestures), we compare two different classification methods in this paper. The first classifier is trained with the conventional static acoustic features and models “how” the spouses spoke. The second is a novel automatic speech recognition-derived classifier, which models “what” the spouses said. We get the best classification performance (82% accuracy) by exploiting the complementarity of these acoustic and lexical information sources through scorelevel fusion of the two classification methods. Index Terms: behavioral signal processing (BSP), couple therapy, blame, acoustic features, lexical features, fusion

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Panayiotis G. Georgiou,et al.  Cross-lingual dialog model for speech to speech translation , 2006, INTERSPEECH.

[3]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[4]  Björn W. Schuller,et al.  Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[6]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[7]  Felix Burkhardt,et al.  Voice attributes affecting likability perception , 2010, INTERSPEECH.

[8]  Panayiotis G. Georgiou,et al.  SailAlign: Robust long speech-text alignment , 2011 .

[9]  Athanasios Katsamanis,et al.  Automatic classification of married couples' behavior using audio features , 2010, INTERSPEECH.

[10]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  References , 1971 .

[12]  Daniel Jurafsky,et al.  Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation , 2009, NAACL.

[13]  Robert T. Spalding,et al.  Clinical Handbook of Couple Therapy, 4th ed. , 2009 .

[14]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[15]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[16]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[17]  Athanasios Katsamanis,et al.  Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples , 2010, INTERSPEECH.

[18]  David C. Atkins,et al.  Traditional versus integrative behavioral couple therapy for significantly and chronically distressed married couples. , 2004, Journal of consulting and clinical psychology.

[19]  Jay L. Lebow,et al.  Clinical handbook of couple therapy, 5th ed. , 2008 .

[20]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.