A supervised non-negative matrix factorization model for speech emotion recognition

Abstract Feature representation plays a critical role in speech emotion recognition (SER). As a method of data dimensionality reduction, Non-negative Matrix Factorization (NMF) can obtain the low-dimensional representation of data by matrix decomposition, and make the data more distinguishable. In order to improve the recognition ability of NMF for SER, we conduct a potential study on NMF and propose a supervised NMF model, called joint discrimination ability and similarity constraint of NMF (DSNMF). This model incorporates the discriminative information and similarity information of samples into basic NMF as prior knowledge, so that the original data can be decomposed into more distinguished low-dimensional data. Specifically, on the one hand, the labels of the training set are used to improve the discriminative ability of the model; on the other hand, with the similarity of the training samples, the data of similar samples are more highly aggregated in the low-dimensional space. In addition, the convergence of DSNMF is proved theoretically and experimentally. Extensive experiments on EMODB and IEMOCAP corpuses show that the proposed approach has a better classification effect on low-dimensional representation data than other NMF models.

[1]  Anastasios Tefas,et al.  Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification , 2006, IEEE Transactions on Neural Networks.

[2]  Rosalind W. Picard Affective Computing , 1997 .

[3]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[4]  Jiguo Yu,et al.  Regularized Non-Negative Matrix Factorization for Identifying Differentially Expressed Genes and Clustering Samples: A Survey , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Sam Kwong,et al.  Semi-Supervised Non-Negative Matrix Factorization With Dissimilarity and Similarity Regularization , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Jing Yang,et al.  3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition , 2018, IEEE Signal Processing Letters.

[7]  Zhigang Luo,et al.  Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent , 2011, IEEE Transactions on Image Processing.

[8]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[9]  Wenbin Li,et al.  Graph regularized discriminative non-negative matrix factorization for face recognition , 2013, Multimedia Tools and Applications.

[10]  Jiqing Han,et al.  Cross-Corpus Speech Emotion Recognition Using Semi-Supervised Transfer Non-Negative Matrix Factorization with Adaptation Regularization , 2019, INTERSPEECH.

[11]  Malcolm Slaney,et al.  BabyEars: A recognition system for affective vocalizations , 2003, Speech Commun..

[12]  Shun-ichi Amari,et al.  Representative and Discriminant Feature Extraction Based on NMF for Emotion Recognition in Speech , 2009, ICONIP.

[13]  Seyedmahdad Mirsamadi,et al.  Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .

[15]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[16]  Lianhong Cai,et al.  Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms , 2018, INTERSPEECH.

[17]  Anu Mehra,et al.  Speech emotion recognition using SVM with thresholding fusion , 2015, 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN).

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Peng Song,et al.  Speech emotion recognition using transfer non-negative matrix factorization , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[22]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[23]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Efthymios Tzinis,et al.  Unsupervised Low-Rank Representations for Speech Emotion Recognition , 2019, INTERSPEECH.

[25]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[26]  Emily Mower Provost,et al.  Emotion classification via utterance-level dynamics: A pattern-based approach to characterizing affective expressions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.