Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients

Abstract Bipolar disorders are characterized by a mood swing, ranging from mania to depression. A system that could monitor and eventually predict these changes would be useful to improve therapy and avoid dangerous events. Speech might convey relevant information about subjects’ mood and there is a growing interest to study its changes in presence of mood disorders. In this work we present an automatic method to characterize fundamental frequency (F0) dynamics in voiced part of syllables. The method performs a segmentation of voiced sounds from running speech samples and estimates two categories of features. The first category is borrowed from Taylor's Tilt intonational model. However, the meaning of the proposed features is different from the meaning of Taylor's ones since the former are estimated from all voiced segments without performing any analysis of intonation. A second category of features takes into account the speed of change of F0. In this work, the proposed features are first estimated from an emotional speech database. Then, an analysis on speech samples acquired from eleven psychiatric patients experiencing different mood states, and eighteen healthy control subjects is introduced. Subjects had to perform a text reading task and a picture commenting task. The results of the analysis on the emotional speech database indicate that the proposed features can discriminate between high and low arousal emotions. This was verified both at single subject and group level. An intra-subject analysis was performed on bipolar patients and it highlighted significant changes of the features with different mood states, although this was not observed for all the subjects. The directions of the changes estimated for different patients experiencing the same mood swing, were not coherent and were task-dependent. Interestingly, a single-subject analysis performed on healthy controls and on bipolar patients recorded twice with the same mood label, resulted in a very small number of significant differences. In particular a very good specificity was highlighted for the Taylor-inspired features and for a subset of the second category of features, thus strengthening the significance of the results obtained with patients. Even if the number of enrolled patients is small, this work suggests that the proposed features might give a relevant contribution to the demanding research field of speech-based mood classifiers. Moreover, the results here presented indicate that a model of speech changes in bipolar patients might be subject-specific and that a richer characterization of subject status could be necessary to explain the observed variability.

[1]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[2]  Klaus Zechner,et al.  The importance of optimal parameter setting for pitch extraction. , 2010 .

[3]  Enzo Pasquale Scilingo,et al.  Speech analysis for mood state characterization in bipolar patients , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[5]  Gaetano Valenza,et al.  Mood recognition in bipolar patients through the PSYCHE platform: Preliminary evaluations and perspectives , 2013, Artif. Intell. Medicine.

[6]  Nicholas B. Allen,et al.  Multichannel Weighted Speech Classification System for Prediction of Major Depression in Adolescents , 2013, IEEE Transactions on Biomedical Engineering.

[7]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using global and local prosodic features , 2013, Int. J. Speech Technol..

[8]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[9]  Haibo Wang,et al.  F0 Contour of Prosodic Word in Happy Speech of Mandarin , 2005, ACII.

[10]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[11]  Maciej Pakosz Attitudinal judgments in intonation: Some evidence for a theory , 1983, Journal of Psycholinguistic Research.

[12]  Carlos Busso,et al.  Shape-based modeling of the fundamental frequency contour for emotion detection in speech , 2014, Comput. Speech Lang..

[13]  Enzo Pasquale Scilingo,et al.  On the deconvolution analysis of electrodermal activity in bipolar patients , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..

[15]  Donna Erickson,et al.  Exploratory Study of Some Acoustic and Articulatory Characteristics of Sad Speech , 2006, Phonetica.

[16]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[17]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[18]  Enzo Pasquale Scilingo,et al.  Wearable Monitoring for Mood Recognition in Bipolar Disorder Based on History-Dependent Long-Term Heart Rate Variability Analysis , 2014, IEEE Journal of Biomedical and Health Informatics.

[19]  J. Markowitz,et al.  The 16-Item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression , 2003, Biological Psychiatry.

[20]  Shrikanth S. Narayanan,et al.  On the robustness of overall F0-only modifications to the perception of emotions in speech. , 2008, The Journal of the Acoustical Society of America.

[21]  H. Murray,et al.  Uses of the thematic apperception test. , 1951, The American journal of psychiatry.

[22]  Michael Cannizzaro,et al.  Voice acoustical measurement of the severity of major depression , 2004, Brain and Cognition.

[23]  J Sundberg,et al.  Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression. , 1988, The Journal of the Acoustical Society of America.

[24]  Thomas F. Quatieri,et al.  On the relative importance of vocal source, system, and prosody in human depression , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[25]  E. Gilboa-Schechtman,et al.  Being “in” or “out” of the game: subjective and acoustic reactions to exclusion and popularity in social anxiety , 2014, Front. Hum. Neurosci..

[26]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[27]  R. C. Young,et al.  A Rating Scale for Mania: Reliability, Validity and Sensitivity , 1978, British Journal of Psychiatry.

[28]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[29]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[30]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[31]  Leonardo Bocchi,et al.  Evaluation of a pitch estimation algorithm for speech emotion recognition , 2009, MAVEBA.

[32]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[33]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[34]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[35]  Björn Schuller,et al.  Computational Paralinguistics , 2013 .

[36]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[37]  K. Scherer,et al.  Beyond arousal: valence and potency/control cues in the vocal expression of emotion. , 2010, The Journal of the Acoustical Society of America.