SVM based Emotional Speaker Recognition using MFCC-SDC Features

Enhancing the performance of emotional speaker recognition process has witnessed an increasing interest in the last years. This paper highlights a methodology for speaker recognition under different emotional states based on the mul-ticlass Support Vector Machine (SVM) classifier. We compare two feature extraction methods which are used to represent emotional speech utterances in order to obtain best accuracies. The first method known as traditional Mel-Frequency Cepstral Coefficients (MFCC) and the second one is MFCC combined with Shifted-Delta-Cepstra (MFCC-SDC). Experimentations are conducted on IEMOCAP database using two multiclass SVM ap-proaches: One-Against-One (OAO) and One Against-All (OAA). Obtained results show that MFCC-SDC features outperform the conventional MFCC.

[1]  Kshirod Sarmah,et al.  GMM based Language Identification using MFCC and SDC Features , 2014 .

[2]  Li Chen,et al.  Emotional Speaker Identification by Humans and Machines , 2011, CCBR.

[3]  J. S. Devi,et al.  Speaker Emotion Recognition based on Speech Features and Classification Techniques , 2014 .

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Yingchun Yang,et al.  Applying pitch-dependent difference detection and modification to emotional speaker recognition , 2008, INTERSPEECH.

[6]  Robert I. Damper,et al.  Multi-class and hierarchical SVMs for emotion recognition , 2010, INTERSPEECH.

[7]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[8]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[9]  Jaakko Astola,et al.  A study of the effect of emotional state upon text-independent speaker identification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Zhaohui Wu,et al.  Mismatched feature detection with finer granularity for emotional speaker recognition , 2014, Journal of Zhejiang University SCIENCE C.

[11]  Douglas A. Reynolds,et al.  A unified deep neural network for speaker and language recognition , 2015, INTERSPEECH.

[12]  Ismail Shahin,et al.  Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments , 2013, International Journal of Speech Technology.

[13]  Shashidhar G. Koolagudi,et al.  Speaker recognition in the case of emotional environment using transformation of speech features , 2012, CUBE.

[14]  A. Lawrence Spitz,et al.  Automatic language identification , 1997 .

[15]  N. Murali Krishna,et al.  Inferring the Human Emotional State of Mind using Assymetric Distrubution , 2013 .

[16]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[17]  Calvin Nkadimeng Language Identification Using Gaussian Mixture Models , 2010 .