论文信息 - Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients

Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients

In this chapter, different variants of Mel-frequency cepstral coefficients (MFCCs) describing human speech emotions are investigated. These features are tested and compared for their robustness in terms of classification accuracy and mean square error. Although MFCC is a reliable feature for speech emotion recognition, it does not consider the temporal dynamics between features which is crucial for such analysis. To address this issue, delta MFCC as its first derivative is extracted for comparison. Due to poor performance of MFCC under noisy condition, both MFCC and delta MFCC features are extracted in wavelet domain in the second phase. Time–frequency characterization of emotions using wavelet analysis and energy or amplitude information using MFCC-based features has enhanced the available information. Wavelet-based MFCCs (WMFCCs) and wavelet-based delta MFCCs (WDMFCCs) outperformed standard MFCCs, delta MFCCs, and wavelets in recognition of Berlin speech emotional utterances. Probabilistic neural network (PNN) has been chosen to model the emotions as the classifier is simple to train, much faster, and allows flexible selection of smoothing parameter than other neural network (NN) models. Highest accuracy of 80.79% has been observed with WDMFCCs as compared to 60.97 and 62.76% with MFCCs and wavelets, respectively.

[1] Mahesh Chandra,et al. Hybrid of wavelet and MFCC features for speaker verification , 2011, 2011 World Congress on Information and Communication Technologies.

[2] Fakhri Karray,et al. Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[3] Astrid Paeschke,et al. A database of German emotional speech , 2005, INTERSPEECH.

[4] Achyuta Nand Mishra,et al. Robust Features for Connected Hindi Digits Recognition , 2011 .

[5] Mihir Narayan Mohanty,et al. Efficient feature combination techniques for emotional speech classification , 2016, Int. J. Speech Technol..

[6] K. Mohanaprasad,et al. Real Time Implementation of Speaker Verification System , 2015 .

[7] S. Muthulakshmi,et al. Real Time Implementation of Speaker Recognition System with MFCC and Neural Networks on FPGA , 2015 .

[8] Oh-Wook Kwon,et al. EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[9] Donald F. Specht,et al. Probabilistic neural networks , 1990, Neural Networks.

[10] G. N. Rathna,et al. Speech Emotion Recognition: Performance Analysis based on Fused Algorithms and GMM Modelling , 2016 .