Classification of stress in speech using linear and nonlinear features

Three systems for the classification of stress in speech are proposed. The first system makes use of linear short time log frequency power coefficients (LFPC), the second employs a Teager energy operator (TEO) based nonlinear frequency domain LFPC features (NFD-LFPC) and the third uses TEO based nonlinear time domain LFPC features (NTD-LFPC). The systems were tested using the SUSAS (speech under simulated and actual stress) database to categorize five stress conditions individually. Results show that the system using LFPC gives the highest accuracy, followed by the system using NFD-LFPC features, while the system using NTD-LFPC features gives the worst performance. For the system using linear LFPC features, average accuracy of 84% and best accuracy of 95% were obtained in classifying five stress categories.

[1]  Ruhi Sarikaya,et al.  Subband based classification of speech under stress , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  John H. L. Hansen,et al.  Speech under stress conditions: overview of the effect on speech production and on system performance , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[5]  John H. L. Hansen,et al.  Classification of speech under stress using target driven features , 1996, Speech Commun..

[6]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[7]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  John H. L. Hansen,et al.  Robust speech recognition training via duration and spectral-based stress token generation , 1995, IEEE Trans. Speech Audio Process..

[9]  D Cairns,et al.  NONLINEAR ANALYSIS AND DETECTION OF SPEECH UNDER STRESSED CONDITIONS , 1994 .

[10]  John H. L. Hansen,et al.  Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[11]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[12]  John H. L. Hansen,et al.  Analysis and compensation of stressed and noisy speech with application to robust automatic recognition , 1988 .