An new speech recognition method based on prosodic analysis and SVM in Zhuang language

In this work, we have analyzed speech signal at different levels for the task of recognizing emotions. Prosodic and spectral features are separately extracted from utterance, word and syllable segments of speech for recognizing emotions. Word boundaries are manually identified, whereas syllable boundaries are identified using VOPs. The combination of spectral and prosodic features found to perform better in case of emotion recognition, in comparison to individual features. Though smaller speech segments like words and syllables contain emotion specific information, the features extracted from these segments may not be suitable for emotion recognition as the performance is marginal. Therefore the ERSs developed using shorter speech segments may be used for online emotion verification task. IITKGP-SESC is used as the speech corpus for this study.