Prosodic feature based speech emotion recognition at segmental and supra segmental levels

Speech emotion recognition has an increasingly significant role in human - computer interfaces as well as in the communication among human beings. This paper presents the results of investigations in emotion recognition based on the prosodic features of 1050 segmental and 1400 supra segmental speech wave files in English. The investigations were done in neutral and six basic emotions collected from ten female speakers of Indian English. The features considered in this investigation are intensity, pitch, and duration or speech rate; which were statistically analyzed. The role of each feature in emotion recognition was quantitatively assessed in terms of the classification rates of the K-Nearest Neighbor, Naive Bayes and the Artificial Neural Network classifiers. At the segmental level, all emotions could be classified, with an average emotion classification rate of 95.91%, based on the prosodic feature set, and these results were validated. The obtained results indicate saving of time and effort by the classification of emotions from minimum inputs and is therefore significant. Besides, the existence of prosody has been acknowledged at the supra segmental level only, as per available literature. At the supra segmental level, all emotions have been recognized at an average classification of 91.96%.