Robust emotional speech classification in the presence of babble noise

Emotional speech recognition (ESR) is a new field of research in the realm of human-computer interactions. Most of the studies in this field are performed in clean environments. Nevertheless, in the real world conditions, there are different noise and disturbance parameters such as car noise, background music, buzz and etc., which can decrease the performance of such recognizing systems. One of the most common noises which can be heard in different places is the babble noise. Because of the similarity of this kind of noise to the desired speech sounds, babble or cross-talk, is highly challenging for different speech-related systems. In this paper, in order to find the most appropriate features for ESR in the presence of babble noise with different signal to noise ratios, 286 features are extracted from speech utterances of two emotional speech datasets in German and Persian. Then the best features are selected among them using different filter and wrapper methods. Finally, different classifiers like Bayes, KNN, GMM, ANN and SVM are used for the selected features in two ways, namely multi-class and binary classifications.

[1]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Taikang Ning,et al.  Power spectrum estimation via orthogonal transformation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Philip C. Loizou,et al.  COLEA: A MATLAB software tool for speech analysis , 1998 .

[4]  Chun Chen,et al.  Manifolds Based Emotion Recognition in Speech , 2007, ROCLING/IJCLCLP.

[5]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[6]  Björn W. Schuller,et al.  On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving , 2007, ACII.

[7]  H. Lane,et al.  The Lombard Sign and the Role of Hearing in Speech , 1971 .

[8]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[9]  Pat Morin,et al.  Output-Sensitive Algorithms for Computing Nearest-Neighbour Decision Boundaries , 2005, Discret. Comput. Geom..

[10]  John H. L. Hansen,et al.  Babble Noise: Modeling, Analysis, and Applications , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[12]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[13]  Mohammad Hossein Sedaaghi,et al.  Gender Classification in Emotional Speech , 2008 .

[14]  Wolfgang J. Hess,et al.  Pitch and voicing determination , 1992 .

[15]  Constantine Kotropoulos,et al.  Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections , 2006, 2006 14th European Signal Processing Conference.

[16]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[17]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[18]  P. Babu Anto,et al.  Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[19]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Kyu-Sik Park,et al.  A Study of Speech Emotion Recognition and Its Application to Mobile Services , 2007, UIC.

[21]  Constantine Kotropoulos,et al.  Using Adaptive Genetic Algorithms to Improve Speech Emotion Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[22]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[23]  George W. Irwin,et al.  Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006Kunming, China, August, ... Notes in Control and Information Sciences) , 2006 .

[24]  Kang-Kue Lee,et al.  Robust feature extraction for mobile-based speech emotion recognition system , 2006 .

[25]  Mohan M. Trivedi,et al.  2010 International Conference on Pattern Recognition Speech Emotion Analysis in Noisy Real-World Environment , 2022 .

[26]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[27]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[28]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[29]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[30]  Yoon Keun Kwak,et al.  Robust emotion recognition feature, frequency range of meaningful signal , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[31]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[32]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .