Improved Classification of Speaking Styles for Mental Health Monitoring Using Phoneme Dynamics

This paper investigates the usefulness of segmental phonemedynamics for classification of speaking styles. We modeled transition details based on the phoneme sequences emitted by a speech recognizer, using data obtained from a recording of 39 depressed patients with 7 different speaking styles normal, pressured, slurred, stuttered, flat, slow and fast speech. We designed and compared two set of phoneme models: a language model treating each phoneme as a word unit (one for each style) and a context-dependent phoneme duration model based on Gaussians for each speaking style considered. The experiments showed that language modeling at the phoneme level performed better than the duration model. We also found that better performance can be obtained by user normalization. To see the complementary effect of the phoneme-based models, the classifiers were combined at a decision level with a Hidden Markov Model (HMM) classifier built from spectral features. The improvement was 5.7% absolute (10.4% relative), reaching 60.3% accuracy in 7-class and 71.0% in 4-class classification.

[1]  Victor Zue,et al.  Automatic language identification using a segment-based approach , 1993, EUROSPEECH.

[2]  John F. Canny,et al.  AnalyzeThis: unobtrusive mental health monitoring by voice , 2011, CHI Extended Abstracts.

[3]  Gerald Friedland,et al.  Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Wei Xu,et al.  Language modeling for dialog system , 2000, INTERSPEECH.

[5]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[6]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7]  K. Scherer,et al.  Vocal expression of affect , 2005 .

[8]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[9]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[10]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[11]  H. C. Nagaraj,et al.  An Approach for Objective Assessment of Stuttered Speech Using MFCC Features , 2009 .

[12]  R. Baker,et al.  The Psychiatric Mental Status Examination , 1993 .

[13]  Karen Holtzblatt,et al.  Contextual design , 1997, INTR.

[14]  Maxine Eskénazi,et al.  Trends in speaking styles research , 1993, EUROSPEECH.

[15]  Daniel Tapias Merino,et al.  Characteristics of slow, average and fast speech and their effects in large vocabulary continuous speech recognition , 1997, EUROSPEECH.

[16]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[17]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[18]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[19]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[20]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .