Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech

Abstract –Research has shown that the voice itself contains important information about immediate psychological state and certain vocal parameters are capable of distinguishing speaking patterns of speech signal affected by emotional disturbances (i.e., clinical depression). In this study, the GMM based feature of the vocal tract system response and spectral energy have been studied and found to be a primary acoustic feature set for separating two groups of female patients carrying a diagnosis of depression and suicidal risk. Index Terms : suicidal speech, depression, vocal tract, energy 1. Introduction Suicide is a common outcome in persons with serious mental disorders. However, it remains a phenomenon that is underresearched and poorly understood. Moreover, methods to help to identify persons who are at an elevated risk are sorely needed in clinical practice. This study represents an attempt to identify characteristic vocal patterns in persons with imminent suicidal potential which could lead to the development of new technology to aid in the assessment of suicidal potential. This project is to study vocal acoustic properties in suicidal states. Two study groups will be contrasted in this work: near–term suicidal and depressed. In the early 1980’s the Silvermans began to collect and analyze recorded suicide notes and interviews made shortly before suicide attempts. Their results suggested that voice can provide important information about immediate psychological state. They have described that the depressed patients have the same vocal speech as suicidal patients but the tonal quality of speech changes significantly when patients become suicidal. As reported in [1], [2], [3], the emotional arousal produces changes in the speech production scheme by affecting the respiratory, phonatory, and articulatory processes that in turn are encoded in the acoustic signal. The emotional content of the voice can be associated with acoustical variables such as thelevel, range, contour, and perturbation of the fundamental frequency, the distribution of energy in frequency spectrum, the location, bandwidth and intensity of formant frequencies, and a variety of temporal measures. The measurable change in vocal parameters affected by emotional disturbances is able to be evaluated by utilizing an appropriate speech processing approach associated with certain acoustic features. Researches have shown that depression has a major effect on the acoustic characteristics of voice when compared to the normal controls. Certain changes in acoustic properties of the affective speech are possibly specific to the near–term suicidal states in persons. In the published pilot studies [4], [6], analytical techniques have been developed to determine if subjects were in one of three mental states: healthy control, non–suicidal depressed, or high–risk suicidal. Several studies have used the vocal tract (VT) measures (i.e., formants) and prosody to classify the emotional disorders. France et. al [4] found the formants and percentages of total energy in frequency spectrum over a frequency range of 0–2,000 Hz to be the most distinguishing acoustic feature set for classifying groups of control, major depressed, and suicidal subjects. These features were recently re–investigated and extracted from a new speech database recorded in a better controlled environment. The experimental results have shown that the investigated feature set was still found as powerful acoustic discriminators in distinguishing suicidal, depressed and remitted patients [1]. Ozdas et. al [6] used a set of low order mel–cepstral coefficients to identify speakers who were diagnosed to be major depressed, suicidal, and normal by a psychiatrist. Her comparative result of classification performance as a measure of group separation was significantly high. Moore et al. compared the results of speaking pattern recognition by employing the prosody, formant and glottal ratio/spectrum in classifying normal controls and depressed patients. The optimal classifiers designated by the glottal ratio/spectrum and formant performed most effectively to separate two individual groups [7]. In this work, the characterization of the vocal tract system and distribution of energy in frequency spectrum of speech signal are focused. The speech processing algorithm to solve a specific problem of extracting the vocal features representing the characteristics of the VT system response is implemented and proposed. The estimate of smoothed magnitude spectrum is determined via the cepstrum analysis and the spectral structure contained in that magnitude spectrum is modeled by a mixture of Gaussian density components whose model parameters are estimated via a well–known “Expectation–Maximization” (EM) algorithm.This paper is organized as follows: Section 2 provides the descriptions of database, feature extraction, primary feature selection, and performance evaluation. Section 3 presents the results. Finally, section 4 concludes all findings from this work.

[1]  Ehud Weinstein,et al.  Parameter estimation of superimposed signals using the EM algorithm , 1988, IEEE Trans. Acoust. Speech Signal Process..

[2]  A.P. Benguerel,et al.  Speech analysis , 1981, Proceedings of the IEEE.

[3]  D. Mitchell Wilkes,et al.  Objective estimation of suicidal risk using vocal output characteristics , 2006, INTERSPEECH.

[4]  M. Mathews,et al.  Talker‐Recognition Procedure Based on Analysis of Variance , 1963 .

[5]  G. Fairbanks Voice and articulation drillbook , 1960 .

[6]  Jae S. Lim Spectral root homomorphic deconvolution system , 1979, ICASSP.

[7]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[8]  K. Scherer Vocal correlates of emotional arousal and affective disturbance. , 1989 .

[9]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[10]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[11]  D. Mitchell Wilkes,et al.  Analysis of Vocal Tract Characteristics for Near-term Suicidal Risk Assessment , 2004, Methods of Information in Medicine.

[12]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[13]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .