Speech based emotion recognition using spectral feature extraction and an ensemble of kNN classifiers

Security (and cyber security) is an important issue in existing and developing technology. It is imperative that cyber security go beyond password based systems to avoid criminal activities. A human biometric and emotion based recognition framework implemented in parallel can enable applications to access personal or public information securely. The focus of this paper is on the study of speech based emotion recognition using a pattern recognition paradigm with spectral feature extraction and an ensemble of k nearest neighbor (kNN) classifiers. The five spectral features are the linear predictive cepstrum (CEP), mel frequency cepstrum (MFCC), line spectral frequencies (LSF), adaptive component weighted cepstrum (ACW) and the post-filter cepstrum (PFL). The bagging algorithm is used to train the ensemble of kNNs. Fusion is implicitly accomplished by ensemble classification. The LDC emotional prosody speech database is used in all the experiments. Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN.

[1]  R. Polikar,et al.  Bootstrap - Inspired Techniques in Computation Intelligence , 2007, IEEE Signal Processing Magazine.

[2]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[3]  Alan McCree Reducing speech coding distortion for speaker identification , 2006, INTERSPEECH.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Brett Y. Smolenski,et al.  Feature and Signal Enhancement for Robust Speaker Identification of G.729 Decoded Speech , 2012, ICONIP.

[6]  Vidhyasaharan Sethu,et al.  Empirical mode decomposition based weighted frequency feature for speech-based emotion classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  R. H. Myers,et al.  Probability and Statistics for Engineers and Scientists , 1978 .

[8]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Cuntai Guan,et al.  Fast emotion detection from EEG using asymmetric spatial filtering , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[12]  Jing Cai,et al.  The Research on Emotion Recognition from ECG Signal , 2009, 2009 International Conference on Information Technology and Computer Science.

[13]  Shantanu Chakrabartty,et al.  An Overview of Statistical Pattern Recognition Techniques for Speaker Verification , 2011, IEEE Circuits and Systems Magazine.

[14]  Ravi P. Ramachandran,et al.  Neural network classifiers and Principal Component Analysis for blind signal to noise ratio estimation of speech signals , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[15]  Richard J. Mammone,et al.  New LP-derived features for speaker identification , 1994, IEEE Trans. Speech Audio Process..

[16]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[17]  David G. Stork,et al.  Pattern Classification , 1973 .

[18]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[19]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Peter Kabal,et al.  The computation of line spectral frequencies using Chebyshev polynomials , 1986, IEEE Trans. Acoust. Speech Signal Process..

[22]  Richard J. Mammone,et al.  A fast algorithm for finding the adaptive component weighted cepstrum for speaker recognition , 1997, IEEE Trans. Speech Audio Process..

[23]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[24]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[25]  Wendi B. Heinzelman,et al.  Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[26]  Arun Ross,et al.  An introduction to biometrics , 2008, ICPR 2008.

[27]  Richard J. Mammone,et al.  Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions , 1998, IEEE Trans. Speech Audio Process..