Speech emotion recognition using kernel sparse representation based classifier

In this paper, we propose to use a kernel sparse representation based classifier (KSRC) for the task of speech emotion recognition. Further, the recognition performance using the KSRC is improved by imposing a group sparsity constraint. The speech utterances with same emotion may have different duration, but the frame sequence information does not play a crucial role in this task. Hence, in this work, we propose to use dynamic kernels which explicitly models the variability in duration of speech signals. Experimental results demonstrate that, given a suitable kernel, KSRC with group sparsity constraint performs better as compared to the state-of-the-art support vector machines (SVM) based classifiers.

[1]  Aroor Dinesh Dileep,et al.  Example-Specific Density Based Matching Kernel for Classification of Varying Length Patterns of Speech Using Support Vector Machines , 2015, ICONIP.

[2]  Anil Kumar Sao,et al.  Greedy dictionary learning for kernel sparse representation based classifier , 2016, Pattern Recognit. Lett..

[3]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[4]  Anil Kumar Sao,et al.  Voiced/nonvoiced detection in compressively sensed speech signals , 2015, Speech Commun..

[5]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[6]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[7]  Loïc Kessous,et al.  Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[8]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[11]  Peng Song,et al.  A feature selection and feature fusion combination method for speaker-independent speech emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Anil Kumar Sao,et al.  Sparse coding based features for speech units classification , 2018, Comput. Speech Lang..

[13]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Rabab Kreidieh Ward,et al.  Classification via group sparsity promoting regularization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  E. Nöth,et al.  Releasing a thoroughly annotated and processed spontaneous emotional database : the FAU Aibo Emotion Corpus , 2008 .

[17]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[18]  Tara N. Sainath,et al.  Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Anil Kumar Sao,et al.  Compressed sensing for unit selection based speech synthesis , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[20]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[21]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[22]  Chellu Chandra Sekhar,et al.  GMM-Based Intermediate Matching Kernel for Classification of Varying Length Patterns of Long Duration Speech Using Support Vector Machines , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[25]  24th European Signal Processing Conference, EUSIPCO 2016, Budapest, Hungary, August 29 - September 2, 2016 , 2016, European Signal Processing Conference.

[26]  Biao Wang,et al.  Kernel collaborative representation-based classifier for face recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Mark D. Plumbley,et al.  Fast Dictionary Learning for Sparse Representations of Speech Signals , 2011, IEEE Journal of Selected Topics in Signal Processing.

[28]  Ting Wang,et al.  Kernel Sparse Representation-Based Classifier , 2012, IEEE Transactions on Signal Processing.

[29]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[30]  Anil Kumar Sao,et al.  Speech enhancement using compressed sensing , 2013, INTERSPEECH.