论文信息 - Speech emotion recognition using kernel sparse representation based classifier

Speech emotion recognition using kernel sparse representation based classifier

In this paper, we propose to use a kernel sparse representation based classifier (KSRC) for the task of speech emotion recognition. Further, the recognition performance using the KSRC is improved by imposing a group sparsity constraint. The speech utterances with same emotion may have different duration, but the frame sequence information does not play a crucial role in this task. Hence, in this work, we propose to use dynamic kernels which explicitly models the variability in duration of speech signals. Experimental results demonstrate that, given a suitable kernel, KSRC with group sparsity constraint performs better as compared to the state-of-the-art support vector machines (SVM) based classifiers.

Aroor Dinesh Dileep | Vinayak Abrol | Pulkit Sharma | Abhijeet Sachdev

[1] Aroor Dinesh Dileep,et al. Example-Specific Density Based Matching Kernel for Classification of Varying Length Patterns of Speech Using Support Vector Machines , 2015, ICONIP.

[2] Anil Kumar Sao,et al. Greedy dictionary learning for kernel sparse representation based classifier , 2016, Pattern Recognit. Lett..

[3] Björn W. Schuller,et al. The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[4] Anil Kumar Sao,et al. Voiced/nonvoiced detection in compressively sensed speech signals , 2015, Speech Commun..

[5] Fakhri Karray,et al. Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[6] Wenming Zheng,et al. A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[7] Loïc Kessous,et al. Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[8] Björn W. Schuller,et al. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[9] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10] Astrid Paeschke,et al. A database of German emotional speech , 2005, INTERSPEECH.

[11] Peng Song,et al. A feature selection and feature fusion combination method for speaker-independent speech emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Anil Kumar Sao,et al. Sparse coding based features for speech units classification , 2018, Comput. Speech Lang..

[13] Yongzhao Zhan,et al. Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[14] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[15] Rabab Kreidieh Ward,et al. Classification via group sparsity promoting regularization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] E. Nöth,et al. Releasing a thoroughly annotated and processed spontaneous emotional database : the FAU Aibo Emotion Corpus , 2008 .

[17] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[18] Tara N. Sainath,et al. Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19] Anil Kumar Sao,et al. Compressed sensing for unit selection based speech synthesis , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[20] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[21] B. Scholkopf,et al. Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[22] Chellu Chandra Sekhar,et al. GMM-Based Intermediate Matching Kernel for Classification of Varying Length Patterns of Long Duration Speech Using Support Vector Machines , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23] Allen Y. Yang,et al. Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[25] 24th European Signal Processing Conference, EUSIPCO 2016, Budapest, Hungary, August 29 - September 2, 2016 , 2016, European Signal Processing Conference.

[26] Biao Wang,et al. Kernel collaborative representation-based classifier for face recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27] Mark D. Plumbley,et al. Fast Dictionary Learning for Sparse Representations of Speech Signals , 2011, IEEE Journal of Selected Topics in Signal Processing.

[28] Ting Wang,et al. Kernel Sparse Representation-Based Classifier , 2012, IEEE Transactions on Signal Processing.

[29] Stefan Steidl,et al. Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[30] Anil Kumar Sao,et al. Speech enhancement using compressed sensing , 2013, INTERSPEECH.