Neural Networks and TEO Features for an Automatic Recognition of Stress in Spontaneous Speech

This study presents automatic stress recognition methods based on acoustic speech analysis. Novel approaches to feature extraction based on the nonlinear Teager energy operator (TEO) calculated within critical bands, discrete wavelet transform bands, and wavelet packet bands are presented. The classification process was performed using two types of neural networks: the multilayer perceptron neural network (MLPNN) and the probabilistic neural network (PNN). The classification efficiency was tested using the actual stress dataset from the SUSAS database. The speech recordings were made by 15 speakers (8 females and 7 males) reading a list of 35 words under three actual conditions: high stress, low stress, and neutral. The best overall performance was observed for the features extracted using the TEO parameters calculated within perceptual wavelet packet bands (TEO-PWP). Depending on the type of mother wavelet, the correct classification scores for the PWP features ranged from 71.24% to 91.56% (using the MLPNN classifier), and from 86.63% to 93.67% (using the PNN). The PNN classifier outperformed the MLPNN classification method.

[1]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[2]  Christine H. Shadle,et al.  Fluid flow in a dynamic mechanical model of the vocal folds and tract. II. Implications for speech production studies , 1999 .

[3]  John H. L. Hansen,et al.  Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[4]  C. Shadle,et al.  Fluid flow in a dynamic mechanical model of the vocal folds and tract. I. Measurements and theory , 1999 .

[5]  John H. L. Hansen,et al.  Speech Under Stress: Analysis, Modeling and Recognition , 2007, Speaker Classification.

[6]  Constantine Kotropoulos,et al.  On the Variants of the Self-Organizing Map That Are Based on Order Statistics , 2006, ICANN.

[7]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[8]  S. Ramamohan,et al.  Sinusoidal model-based analysis and classification of stressed speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[10]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[11]  Haizhou Li,et al.  Integrating Articulatory based Features with Auditory Based Features for Robust Stressed Speech Recognition , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[12]  H. Teager Some observations on oral air flow during phonation , 1980 .

[13]  Nicholas B. Allen,et al.  Recognition of stress in speech using wavelet analysis and Teager energy operator , 2008, INTERSPEECH.

[14]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[15]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[16]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[17]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[18]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.