Feature Space Mahalanobis Sequence Kernels: Application to SVM Speaker Verification

The generalized linear discriminant sequence (GLDS) kernel has been shown to provide very good performance and efficiency at the NIST Speaker Recognition Evaluations (SRE) in the last few years. This kernel is based on an explicit map of polynomial expansions of input frames which, because of practical limitations, have to be of a degree less or equal to three. In this paper, we consider an extension of the GLDS kernel to allow not only any polynomial degree but also any embedding, including infinite-dimensional ones associated with Mercer kernels (such as Gaussian kernels). It turns out that the resulting kernels belong to the family of posterior covariance kernels. However, their exact ldquokernelizedrdquo form involves the computation of the Gram matrix on background data, and may be intractable when the background corpus is very large (which is the case in speaker verification). To overcome this problem, we use a low-rank approximation of the Gram matrix to provide an approximate but tractable form of these kernels. We then present comparative experiments on NIST SRE 2005. The results show that our sequence kernel outperforms the GLDS one, and gives similar (individual) performances to the traditional universal background model-Gaussiam mixture model (UBM-GMM) system. As expected, the fusion of both improves the scores.

[1]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[2]  Gene H. Golub,et al.  Matrix computations , 1983 .

[3]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  Herbert Gish,et al.  Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[7]  Matthias W. Seeger,et al.  Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[8]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[9]  Vincent Wan,et al.  Speaker verification using support vector machines , 2003 .

[10]  Jean-François Bonastre,et al.  FUSING GENERATVE AND DISCRIMINATIVE UBM-BASED SYSTEMS FOR SPEAKER VERIFICATION , 2006 .

[11]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[12]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Shigeki Sagayama,et al.  Support vector machine with dynamic time-alignment kernel for speech recognition , 2001, INTERSPEECH.

[14]  Gérard Chollet,et al.  Support Vector Gmms for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[15]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[16]  Young-Seok Choi,et al.  Affine Projection Algorithms with Adaptive Regularization Matrix , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  Jérôme Louradour,et al.  Conceiving a new sequence kernel and applying it to SVM speaker verification , 2005, INTERSPEECH.

[19]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Pedro J. Moreno,et al.  A new SVM approach to speaker identification and verification using probabilistic distance kernels , 2003, INTERSPEECH.

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[23]  Pedro J. Moreno,et al.  SVM kernel adaptation in speaker classification and verification , 2004, INTERSPEECH.

[24]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[25]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[26]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[27]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[28]  Samy Bengio,et al.  A Max Kernel For Text-Independent Speaker Verification Systems , 2006 .

[29]  Siwei Lyu,et al.  A Kernel Between Unordered Sets of Data: The Gaussian Mixture Approach , 2005, ECML.

[30]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[32]  Rama Chellappa,et al.  From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel Hilbert space , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[34]  José Carlos Príncipe,et al.  An Explicit Construction Of A Reproducing Gaussian Kernel Hilbert Space , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[35]  Gilles Blanchard,et al.  Kernel Projection Machine: a New Tool for Pattern Recognition , 2004, NIPS.

[36]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[37]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[38]  Gérard Chollet,et al.  Combining GMM's with suport vector machines for text-independent speaker verification , 2001, INTERSPEECH.

[39]  Zhaohui Wu,et al.  Mixture of support vector machines for text-independent speaker recognition , 2005, INTERSPEECH.

[40]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[41]  Jean-François Bonastre,et al.  ALIZE, a free toolkit for speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[42]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[43]  R. Kass,et al.  Shrinkage Estimators for Covariance Matrices , 2001, Biometrics.

[44]  Ran Gazit,et al.  SVM-based Speaker Classification in the GMM Models Space , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[45]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[46]  Jean-Philippe Tarel,et al.  Non-Mercer Kernels for SVM Object Recognition , 2004, BMVC.

[47]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[48]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[49]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[50]  James Carmichael,et al.  Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data , 2005, INTERSPEECH.

[51]  J. Picone,et al.  Speaker Verification using Support Vector Machines , 2006, Proceedings of the IEEE SoutheastCon 2006.