Study of prosodic feature extraction for multidialectal Odia speech emotion recognition

In this paper a speaker-independent and text-dependent speech emotion recognition system has been presented for the Cuttacki, Sambalpuri and Berhampuri dialects of the Odia language. A dialect is any distinguishable variety of a language spoken by a group of people. Emotions provide naturalness to speech. Here prosodic features are extracted from speech and used for classification of emotions. Prosodic features are represented by pitch, energy, duration, and formant. In order to evaluate the system performance for prosodic features the Orthogonal Forward Selection(OFS) algorithm is used for significant feature selection, and the Gaussian Mixture Model(GMM) and Support Vector Machine(SVM) for classification. The analysis of results, after significant features were found using the OFS algorithm, shows that SVM is a better classification algorithm compared to GMM. The study also shows distinctions between emotions of males and females after feature extractions.

[1]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[3]  A. Routray,et al.  Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[4]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[5]  Oscal T.-C. Chen,et al.  Affective understanding of online songs and speeches , 2010, 2010 53rd IEEE International Midwest Symposium on Circuits and Systems.

[6]  Björn W. Schuller,et al.  Affective speaker state analysis in the presence of reverberation , 2011, Int. J. Speech Technol..

[7]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[8]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[9]  Subhasmita Sahoo,et al.  Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition , 2015, International Journal of Speech Technology.

[10]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[11]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[12]  Klaus R. Scherer,et al.  Emotion dimensions and formant position , 2009, INTERSPEECH.

[13]  Kah Phooi Seng,et al.  A new approach of audio emotion recognition , 2014, Expert Syst. Appl..

[14]  K.Z. Mao,et al.  Orthogonal forward selection and backward elimination algorithms for feature subset selection , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).