Classification of Stressed Speech using Gaussian Mixture Model

In this work, different speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Coefficients (CC) and Mel Frequency Cepstral Coefficients (MFCC) are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density characteristics, F-ratio test, Kolmogorv-Smirnov test and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. A novel statistical Feature Discrimination Measure (FDM) is proposed for the same purpose. Gaussian Mixture Model (GMM) classifier is tested for recognition of different stress levels in a speech signal. Speech Under Simulated Emotion (SUSE) database has been used for stress analysis. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with both GMM and VQ classifier. FDM values and KS test suggest similar performance for the speech features. F-ratio values indicate best performance with SFF followed by SAF, MFCC and CC respectively.

[1]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[2]  S. Ramamohan,et al.  Sinusoidal model-based analysis and classification of stressed speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[5]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..