Predicting couple therapy outcomes based on speech acoustic features

Automated assessment and prediction of marital outcome in couples therapy is a challenging task but promises to be a potentially useful tool for clinical psychologists. Computational approaches for inferring therapy outcomes using observable behavioral information obtained from conversations between spouses offer objective means for understanding relationship dynamics. In this work, we explore whether the acoustics of the spoken interactions of clinically distressed spouses provide information towards assessment of therapy outcomes. The therapy outcome prediction task in this work includes detecting whether there was a relationship improvement or not (posed as a binary classification) as well as discerning varying levels of improvement or decline in the relationship status (posed as a multiclass recognition task). We use each interlocutor’s acoustic speech signal characteristics such as vocal intonation and intensity, both independently and in relation to one another, as cues for predicting the therapy outcome. We also compare prediction performance with one obtained via standardized behavioral codes characterizing the relationship dynamics provided by human experts as features for automated classification. Our experiments, using data from a longitudinal clinical study of couples in distressed relations, showed that predictions of relationship outcomes obtained directly from vocal acoustics are comparable or superior to those obtained using human-rated behavioral codes as prediction features. In addition, combining direct signal-derived features with manually coded behavioral features improved the prediction performance in most cases, indicating the complementarity of relevant information captured by humans and machine algorithms. Additionally, considering the vocal properties of the interlocutors in relation to one another, rather than in isolation, showed to be important for improving the automatic prediction. This finding supports the notion that behavioral outcome, like many other behavioral aspects, is closely related to the dynamics and mutual influence of the interlocutors during their interaction and their resulting behavioral patterns.

[1]  C. Hill,et al.  Assessing psychotherapy outcomes and processes. , 1994 .

[2]  Katherine J. W. Baucom,et al.  Observed communication in couples two years after integrative and traditional behavioral couple therapy: outcome and link with five-year follow-up. , 2011, Journal of consulting and clinical psychology.

[3]  Shrikanth Narayanan,et al.  Advancing methods for reliably assessing motivational interviewing fidelity using the motivational interviewing skills code. , 2015, Journal of substance abuse treatment.

[4]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[5]  R. Heyman,et al.  Observation of couple conflicts: clinical assessment applications, stubborn truths, and shaky foundations. , 2001, Psychological assessment.

[6]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  A. Pentland,et al.  Thin slices of negotiation: predicting outcomes from conversational dynamics within the first 5 minutes. , 2007, The Journal of applied psychology.

[8]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[9]  N. Jacobson,et al.  Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. , 1991, Journal of consulting and clinical psychology.

[10]  Athanasios Katsamanis,et al.  Automatic classification of married couples' behavior using audio features , 2010, INTERSPEECH.

[11]  John M. Gottman,et al.  The Timing of Divorce: Predicting When a Couple Will Divorce Over a 14‐Year Period , 2000 .

[12]  Alex Pentland,et al.  Socially aware, computation and communication , 2005, Computer.

[13]  K. Scherer,et al.  Vocal expression of affect , 2005 .

[14]  D. Orlinsky,et al.  Process and outcome in psychotherapy: Noch einmal. , 1994 .

[15]  Panayiotis G. Georgiou,et al.  A dynamic model for behavioral analysis of couple interactions using acoustic features , 2015, INTERSPEECH.

[16]  D. Snyder Multidimensional Assessment of Marital Satisfaction. , 1979 .

[17]  Panayiotis G. Georgiou,et al.  Data driven modeling of head motion towards analysis of behaviors in couple interactions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  A. Bergin,et al.  The effectiveness of psychotherapy. , 1994 .

[19]  Rahul Gupta,et al.  A language-based generative model framework for behavioral analysis of couples' therapy , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[21]  F. Fincham,et al.  What Predicts Divorce and Relationship Dissolution , 2005 .

[22]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[23]  G. Spanier Measuring Dyadic Adjustment: new scales for assessing the quality of marriage and similar dyads , 1976 .

[24]  J. Coan,et al.  Predicting marital stability and divorce in newlywed couples. , 2000, Journal of family psychology.

[25]  Julie S. Gottman,et al.  Gottman method couple therapy. , 2008 .

[26]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[27]  Yang Li,et al.  Recognizing emotions in speech using short-term and long-term features , 1998, ICSLP.

[28]  L. Greenberg,et al.  Emotionally Focused Couples Therapy: Status and Challenges , 1999 .

[29]  Björn W. Schuller,et al.  Comparing one and two-stage acoustic modeling in the recognition of emotion in speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[30]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[31]  Panayiotis G. Georgiou,et al.  Behavioral signal processing for understanding (distressed) dyadic interactions: some recent developments , 2011, J-HGBU '11.

[32]  M. Sanders,et al.  A comparison of the generalization of behavioral marital therapy and enhanced behavioral marital therapy. , 1993, Journal of consulting and clinical psychology.

[33]  Panayiotis G. Georgiou,et al.  Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language , 2013, Proceedings of the IEEE.

[34]  James J. Lindsay,et al.  Cues to deception. , 2003, Psychological bulletin.

[35]  J. Gottman What predicts divorce? The relationship between marital processes and marital outcomes. , 1994 .

[36]  Klaus Krippendorff,et al.  Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .

[37]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[38]  N. Epstein,et al.  Cognitive-behavioral couple therapy. , 2017, Current opinion in psychology.

[39]  Björn W. Schuller,et al.  Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing , 2007, ACII.

[40]  Nadia Mana,et al.  Multimodal recognition of personality traits in social interactions , 2008, ICMI '08.

[41]  H. Markman,et al.  Structural flaws in the bridge from basic research on marriage to interventions for couples , 2000 .

[42]  R. Heyman,et al.  The Hazards of Predicting Divorce Without Crossvalidation. , 2001, Journal of marriage and the family.

[43]  J. Gottman,et al.  Marital interaction and satisfaction: a longitudinal view. , 1989, Journal of consulting and clinical psychology.

[44]  J. Gottman Marital Interaction: Experimental Investigations , 1980 .

[45]  A. Pentland Social Signal Processing [Exploratory DSP] , 2007, IEEE Signal Processing Magazine.

[46]  Panayiotis G. Georgiou,et al.  Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models , 2016, INTERSPEECH.

[47]  Levent Özgür,et al.  Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[48]  J. Gottman,et al.  The roles of conflict engagement, escalation, and avoidance in marital interaction: a longitudinal view of five types of couples. , 1993, Journal of consulting and clinical psychology.

[49]  Björn W. Schuller,et al.  Voice and Speech Analysis in Search of States and Traits , 2011, Computer Analysis of Human Behavior.

[50]  K. O’leary,et al.  Can questionnaire reports correctly classify relationship distress and partner physical abuse? , 2001, Journal of family psychology : JFP : journal of the Division of Family Psychology of the American Psychological Association.

[51]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[52]  R. Frick Communicating emotion: The role of prosodic features. , 1985 .

[53]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[54]  Shrikanth S. Narayanan,et al.  The psychologist as an interlocutor in autism spectrum disorder assessment: insights from a study of spontaneous prosody. , 2014, Journal of speech, language, and hearing research : JSLHR.

[55]  Andrew Christensen,et al.  Observed communication and associations with satisfaction during traditional and integrative behavioral couple therapy. , 2008, Behavior therapy.

[56]  J. Gottman,et al.  Marital processes predictive of later dissolution: behavior, physiology, and health. , 1992, Journal of personality and social psychology.

[57]  M. L. Smith,et al.  Meta-analysis of psychotherapy outcome studies. , 1977, The American psychologist.

[58]  David C. Atkins,et al.  Prediction of response to treatment in a randomized clinical trial of couple therapy: a 2-year follow-up. , 2009, Journal of consulting and clinical psychology.

[59]  P. Bentler,et al.  Longitudinal study of marital success and failure. , 1978 .

[60]  Panayiotis G. Georgiou,et al.  Still together?: the role of acoustic features in predicting marital outcome , 2015, INTERSPEECH.

[61]  Shrikanth S. Narayanan,et al.  "It sounds like...": A natural language processing approach to detecting counselor reflections in motivational interviewing. , 2016, Journal of counseling psychology.

[62]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[63]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[64]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[65]  Andrew Christensen,et al.  Integrative Behavioral Couple Therapy. , 2017, Current opinion in psychology.

[66]  David C. Atkins,et al.  Traditional versus integrative behavioral couple therapy for significantly and chronically distressed married couples. , 2004, Journal of consulting and clinical psychology.

[67]  Shrikanth S. Narayanan,et al.  Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[68]  N. Epstein Cognitive therapy with couples , 1982 .

[69]  John M. Gottman,et al.  Predicting Marital Happiness and Stability from Newlywed Interactions , 1998 .

[70]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[71]  C. Broman Thinking of Divorce, but Staying Married , 2002 .

[72]  W. Lutz,et al.  Evaluation of psychotherapy. Efficacy, effectiveness, and patient progress. , 1996, The American psychologist.

[73]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[74]  Athanasios Katsamanis,et al.  Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions , 2014, Comput. Speech Lang..

[75]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[76]  N. Andreasen,et al.  The Longitudinal Interval Follow-up Evaluation. A comprehensive method for assessing outcome in prospective longitudinal studies. , 1987, Archives of general psychiatry.

[77]  John M. Gottman,et al.  How a couple views their past predicts their future: Predicting divorce from an oral history interview. , 1992 .

[78]  Mari Ostendorf,et al.  Detection Of Agreement vs. Disagreement In Meetings: Training With Unlabeled Data , 2003, NAACL.

[79]  N. Jacobson,et al.  Marital Therapy , 1986 .

[80]  J. Bachorowski,et al.  Vocal Expression of Emotion: Acoustic Properties of Speech Are Associated With Emotional Intensity and Context , 1995 .

[81]  B R Karney,et al.  The longitudinal course of marital quality and stability: a review of theory, method, and research. , 1995, Psychological bulletin.

[82]  Athanasios Katsamanis,et al.  Automatic Identification of Salient Acoustic Instances in Couples' Behavioral Interactions Using Diverse Density Support Vector Machines , 2011, INTERSPEECH.

[83]  H. Kim,et al.  Generalizability of Gottman and Colleagues' Affective Process Models of Couples' Relationship Outcomes. , 2007, Journal of marriage and the family.

[84]  Athanasios Katsamanis,et al.  "You made me do it": Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information , 2011, INTERSPEECH.

[85]  K. Masters,et al.  Assessing outcome in clinical practice , 1996 .

[86]  Alex Pentland,et al.  Social signals, their function, and automatic analysis: a survey , 2008, ICMI '08.

[87]  Shrikanth S. Narayanan,et al.  A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice , 2013, INTERSPEECH.

[88]  R. Zipursky,et al.  A systematic review of longitudinal outcome studies of first-episode psychosis , 2006, Psychological Medicine.

[89]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[90]  Mireia Farrús,et al.  Jitter and shimmer measurements for speaker recognition , 2007, INTERSPEECH.

[91]  Michael J. Carey,et al.  Language independent gender identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[92]  Athanasios Katsamanis,et al.  Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features , 2013, Speech Commun..

[93]  Kyle J. Susa,et al.  Can intuition improve deception detection performance , 2009 .

[94]  Katherine J. W. Baucom,et al.  Changes in dyadic communication during and after integrative and traditional behavioral couple therapy. , 2015, Behaviour research and therapy.

[95]  H. Giles,et al.  Contexts of Accommodation: Developments in Applied Sociolinguistics , 2010 .

[96]  J. Pittam Voice in Social Interaction: An Interdisciplinary Approach , 1994 .

[97]  David C. Atkins,et al.  Prediction of treatment response at 5-year follow-up in a randomized clinical trial of behaviorally based couple therapies. , 2015, Journal of consulting and clinical psychology.

[98]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[99]  E. Waters,et al.  How much observational data is enough? An empirical test using marital interaction coding. , 2001, Behavior therapy.

[100]  David C. Atkins,et al.  Prediction of response to treatment in a randomized clinical trial of marital therapy. , 2005, Journal of consulting and clinical psychology.

[101]  Panayiotis G. Georgiou,et al.  "That's Aggravating, Very Aggravating": Is It Possible to Classify Behaviors in Couple Interactions Using Automatically Derived Lexical Features? , 2011, ACII.

[102]  Panayiotis G. Georgiou,et al.  Redundancy analysis of behavioral coding for couples therapy and improved estimation of behavior from noisy annotations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[103]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[104]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.