Multi-class SVM for stressed speech recognition

This paper deals with a new automatic stressed recognition system based on kernel classification. We extracted advanced acoustic features from the stressed signals and employed a multi-class Support Vector Machines with different kernels to recognize speech utterances under stress. Gammatone Frequency Cepstral Coefficients are also established. The system implemented is tested using isolated words from SUSAS database with 4 classes: Neutral, Angry, Lombard and Loud. Experimental results show that the best performance is obtained when we use the auditory feature with different descriptors combination but it depends on the type of the kernel used.

[1]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[2]  John H. L. Hansen,et al.  Speech Under Stress: Analysis, Modeling and Recognition , 2007, Speaker Classification.

[3]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[4]  John H. L. Hansen,et al.  Classification of speech under stress using target driven features , 1996, Speech Commun..

[5]  Robert I. Damper,et al.  Multi-class and hierarchical SVMs for emotion recognition , 2010, INTERSPEECH.

[6]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[8]  S. Prasanna,et al.  Speech Recognition under Stress Condition , 2009 .

[9]  Yongmei Zhang,et al.  Binary tree SVM-based Emotion Recognition from Speech Signal , 2013 .

[10]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[11]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Thaweesak Yingthawornsuk,et al.  Speech Recognition using MFCC , 2012 .

[14]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[15]  Ismail Shahin,et al.  Speech Recognition under Stress , 2009, International Conference on Bioinformatics & Computational Biology.

[16]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[17]  John H. L. Hansen,et al.  N-channel hidden Markov models for combined stressed speech classification and recognition , 1999, IEEE Trans. Speech Audio Process..

[18]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[19]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[20]  L. Deng Ieee Transactions on Speech and Audio Processing, Speech Trajectory Discrimination Using the Minimum Classiication Error Learning , 1997 .

[21]  Ishwar K. Sethi,et al.  Investigation of combining SVM and decision tree for emotion classification , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[22]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[23]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[24]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[25]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.