Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features

Nowadays, recognition of human emotion is a challenging yet important speech technology. In this paper, based on deriving prosody features from emotional speech, some voice quality features are proposed to be extracted as new emotional features to improve emotion recognition. Utilizing support vector machines classifier, four emotions from Chinese natural emotional speech corpus including anger, joy, sadness and neutral are discriminated by combining prosody and voice quality features. The experiment results show that combining prosody and voice quality features yields an overall accuracy of 76% for emotion recognition, which makes approximately 10% improvement compared with using the single prosody features. It also shows that voice quality features in speech are effective emotional features and can promote prosody features for improving emotion recognition results.