Multimodal Prediction of Psychological Disorders: Learning Verbal and Nonverbal Commonalities in Adjacency Pairs

Semi-structured interviews are widely used in medical settings to gather information from individuals about psychological disorders, such as depression or anxiety. These interviews typically consist of a series of question and response pairs, which we refer to as adjacency pairs. We propose a computational model, the Multimodal HCRF, that considers the commonalities among adjacency pairs and information from multiple modalities to infer the psychological states of the interviewees. We collect data and perform experiments on a human to virtual human interaction data set. Our multimodal approach gives a significant advantage over conventional holistic approaches which ignore the adjacency pair context in predicting depression from semi-structured interviews.

[1]  Kallirroi Georgila,et al.  Verbal indicators of psychological distress in interactive dialogue with a virtual human , 2013, SIGDIAL Conference.

[2]  S. Brunnhuber,et al.  Facial Expression and Experience of Emotions in Psychodynamic Interviews with Patients with PTSD in Comparison to Healthy Subjects , 2007, Psychopathology.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[5]  H. Ellgring Nonverbal communication in depression , 1989 .

[6]  J. Schelde Major depression: behavioral parameters of depression and recovery. , 1998, The Journal of nervous and mental disease.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Christophe d'Alessandro,et al.  Spectral correlates of voice open quotient and glottal flow asymmetry : theory, limits and experimental data , 2001, INTERSPEECH.

[9]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[10]  Lauren M. Bylsma,et al.  A meta-analysis of emotional reactivity in major depressive disorder. , 2008, Clinical psychology review.

[11]  John E. Perez,et al.  Nonverbal Social Skills and Psychopathology , 2003 .

[12]  Martin Havlík,et al.  Emanuel A. Schegloff: Sequence Organization in Interaction. Volume 1. A Primer in Conversation Analysis , 2010 .

[13]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[14]  Rena A. Menke Examining nonverbal shame markers among post-pregnancy women with maltreatment histories , 2011 .

[15]  Javier R. Movellan,et al.  Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[16]  E. Schegloff Sequence Organization in Interaction: Contents , 2007 .

[17]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[18]  Louis-Philippe Morency,et al.  Automatic Nonverbal Behavior Indicators of Depression and PTSD: Exploring Gender Differences , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[19]  Albert A. Rizzo,et al.  Automatic behavior descriptors for psychological disorder analysis , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  Gwen Littlewort,et al.  The computer expression recognition toolbox (CERT) , 2011, Face and Gesture 2011.

[21]  J. Teasdale,et al.  Speech rate as a measure of short-term variation in depression. , 1980, The British journal of social and clinical psychology.

[22]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[23]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[24]  Siobhan Chapman Logic and Conversation , 2005 .

[25]  R. Spitzer,et al.  The PHQ-9: A new depression diagnostic and severity measure , 2002 .

[26]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[27]  John Kane,et al.  Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform , 2011, INTERSPEECH.

[28]  Zhou Yu,et al.  Automatic Prediction of Friendship via Multi-model Dyadic Features , 2013, SIGDIAL Conference.

[29]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  M T McGuire,et al.  Nonverbal interaction of patients and therapists during psychiatric interviews. , 1982, Journal of abnormal psychology.

[31]  Judith A. Hall,et al.  Nonverbal behavior in clinician—patient interaction , 1995 .

[32]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Louis-Philippe Morency,et al.  Investigating voice quality as a speaker-independent indicator of depression and PTSD , 2013, INTERSPEECH.

[34]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[35]  J. Schelde Major depression: behavioral markers of depression and recovery. , 1998, The Journal of nervous and mental disease.

[36]  John Kane,et al.  A comparative study of glottal open quotient estimation techniques , 2013, INTERSPEECH.

[37]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .