Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines

Dynamic kernel based support vector machines are used for classification of varying length patterns. This paper explores the use of intermediate matching kernel (IMK) as a dynamic kernel for classification of varying length patterns of long duration speech represented as sets of feature vectors. The main issue in construction of IMK is the choice for the set of virtual feature vectors used to select the local feature vectors for matching. The components of class-independent GMM (CIGMM) have been used earlier as a representation for the set of virtual feature vectors. For every component of CIGMM, a local feature vector each from the two sets of local feature vectors that has the highest probability of belonging to that component is selected and a base kernel is computed between the selected local feature vectors. The IMK is computed as the sum of all the base kernels corresponding to different components of CIGMM. The construction of CIGMM-based IMK does not use the class-specific information, as the local feature vectors are selected using the components of CIGMM that is common for all the classes. We propose two novel methods to build a better discriminatory IMK-based SVM classifier by considering a set of virtual feature vectors specific to each class depending on the approaches to multiclass classification using SVMs. In the first method, we propose a class-wise IMK based SVM for every class by using components of GMM built for a class as the set of virtual feature vectors for that class in the one-against-the-rest approach to multiclass pattern classification. In the second method, we propose a pairwise IMK based SVM for every pair of classes by using components of GMM built for a pair of classes as the set of virtual feature vectors for that pair of classes in the one-against-one approach to multiclass classification. We also proposed to use the mixture coefficient weighted and responsibility term weighted base kernels in computation of class-specific IMKs to improve their discrimination ability. This paper also proposes the posterior probability weighted dynamic kernels to improve their classification performance and reduce the number of support vectors. The performance of the SVM-based classifiers using the proposed class-specific IMKs is studied for speech emotion recognition and speaker identification tasks and compared with that of the SVM-based classifiers using the state-of-the-art dynamic kernels.

[1]  Steve Renals,et al.  Evaluation of kernel methods for speaker verification and identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[3]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[4]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[5]  Ethem Alpaydin,et al.  Multiclass Posterior Probability Support Vector Machines , 2008, IEEE Transactions on Neural Networks.

[6]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Jean-Philippe Tarel,et al.  Non-Mercer Kernels for SVM Object Recognition , 2004, BMVC.

[9]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Fei-Yue Wang,et al.  Posterior probability support vector Machines for unbalanced data , 2005, IEEE Transactions on Neural Networks.

[12]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  N. Boujemaa,et al.  The intermediate matching kernel for image local features , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[15]  Chellu Chandra Sekhar,et al.  Combination of generative models and SVM based classifier for speech emotion recognition , 2009, 2009 International Joint Conference on Neural Networks.

[16]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[17]  Yasunari Obuchi,et al.  Emotion Recognition using Mel-Frequency Cepstral Coefficients , 2007 .

[18]  Haizhou Li,et al.  A GMM-based probabilistic sequence kernel for speaker verification , 2007, INTERSPEECH.

[19]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[20]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[21]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Haizhou Li,et al.  GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.