How to categorize emotional speech signals with respect to the speaker's degree of emotional intensity

Recently, classifying different emotional content of speech signals automatically has become one of the most important comprehensive inquiries. The main subject in thiseld is related to the improvement of the correct classication rate (CCR) resulting from the proposed techniques. However, a literature review shows that there is no notable research onnding appropriate parameters that are related to the intensity of emotions. In this article, we investigate the proper features to be employed in the recognition of emotional speech utterances according to their intensities. In this manner, 4 emotional classes of the Berlin Emotional Speech database, happiness, anger, fear, and boredom, are evaluated in high and low intensity degrees. Utilizing different classiers, a CCR of about 70% is obtained. Moreover, a 10-fold cross-validation procedure is used to enhance the consistency of the results.

[1]  Chun Chen,et al.  Speech Emotion Recognition and Intensity Estimation , 2004, ICCSA.

[2]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[3]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Xiaobin Wang,et al.  Efficient Speech Emotion Recognition Based on Multisurface Proximal Support Vector Machine , 2008, 2008 IEEE Conference on Robotics, Automation and Mechatronics.

[6]  Taikang Ning,et al.  Power spectrum estimation via orthogonal transformation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Tsang-Long Pao,et al.  Comparison of Several Classifiers for Emotion Recognition from Noisy Mandarin Speech , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).

[9]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[10]  Mohan M. Trivedi,et al.  2010 International Conference on Pattern Recognition Speech Emotion Analysis in Noisy Real-World Environment , 2022 .

[11]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[12]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[13]  K. Scherer,et al.  Effect of experimentally induced stress on vocal parameters. , 1986, Journal of experimental psychology. Human perception and performance.

[14]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[16]  R. V. Bezooijen Characteristics and recognizability of vocal expressions of emotion , 1984 .

[17]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[18]  John H. L. Hansen,et al.  Angry emotion detection from real-life conversational speech by leveraging content structure , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  John H. L. Hansen,et al.  Robust Emotional Stressed Speech Detection Using Weighted Frequency Subbands , 2011, EURASIP J. Adv. Signal Process..

[21]  Chih-Jen Lin,et al.  The analysis of decomposition methods for support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[22]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[23]  Chang Dong Yoo,et al.  Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen's emotion model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  M. Sondhi,et al.  New methods of pitch extraction , 1968 .

[25]  Salman Karimi,et al.  Robust emotional speech classification in the presence of babble noise , 2012, International Journal of Speech Technology.

[26]  Mohammad Hossein Sedaaghi,et al.  Gender Classification in Emotional Speech , 2008 .

[27]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  P. Babu Anto,et al.  Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[29]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[30]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Philip C. Loizou,et al.  COLEA: A MATLAB software tool for speech analysis , 1998 .

[32]  Thierry Dutoit,et al.  Passive versus active: Vocal classification system , 2005, 2005 13th European Signal Processing Conference.

[33]  Wolfgang J. Hess,et al.  Pitch and voicing determination , 1992 .

[34]  Constantine Kotropoulos,et al.  Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections , 2006, 2006 14th European Signal Processing Conference.