NMF-based Cepstral Features for Speech Emotion Recognition

Speech Emotion Recognition (SER) has obtained growing attention during the past years. For this purpose, various methods have been proposed. Feature extraction is the major part of SER methods and aims to attain effective emotional features from speech signal. One of the most important features in speech processing task is Mel Frequency Cepstral Coefficients (MFCC). The vocal production mechanisms of speakers at different emotional states can improve the discrimination abilities of the aforementioned features for SER task. This work aims to propose a novel feature extraction scheme for SER task that integrates this particular information through the decomposition of emotional speech spectra and providing an improved spectral representation of various emotions. By employing this scheme, two novel procedures are represented. In the first procedures, cepstral-like features are obtained by a filter bank which is computed by Non-negative Matrix Factorization (NMF) technique on emotional speech spectra. In the second procedures, the activation coefficients of NMF technique which are achieved by decomposition of speech spectrums, are considered as the new features. Finally, to increase the discrimination abilities of features among emotion classes, each of the feature vectors is normalized to its mean value. According to experiments on Emo-DB database, fusion of the proposed features with MFCCs outperforms the performance of an SER system compared with conventional MFCC as the baseline or the simple unsupervised NMF-based features derived from the speech spectra.

[1]  M. S. Likitha,et al.  Speech based human emotion recognition using MFCC , 2017, 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET).

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Jimmy Ludeña-Choez,et al.  Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species , 2017, PloS one.

[4]  Sanaz Seyedin,et al.  Multi-layer Kullback-Leibler-based Complex NMF with LPC error clustering for blind source separation , 2017, 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE).

[5]  Hong Jeong,et al.  NMF features for speech emotion recognition , 2009, ICHIT '09.

[6]  Peng Song,et al.  Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization , 2016, Speech Commun..

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Michael Lindenbaum,et al.  Nonnegative Matrix Factorization with Earth Mover's Distance Metric for Image Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  P. Malathi,et al.  Speaker dependent speech emotion recognition using MFCC and Support Vector Machine , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[10]  Jing Zhao,et al.  Document Clustering Based on Nonnegative Sparse Matrix Factorization , 2005, ICNC.

[11]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[12]  Björn W. Schuller,et al.  Non-negative matrix factorization as noise-robust feature extractor for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.