A Non-Linear Speaker Adaptation Technique using Kernel Ridge Regression

We propose a non-linear model space transformation for speaker or environment adaptation based on weighted kernel ridge regression (KRR). The transformation is given by a generalized least squares linear regression in a kernel-induced feature space operating on Gaussian mixture model means and having as targets the adaptation frames. Using the "kernel trick", the solution to the optimization problem is obtained by solving a system of linear equations involving the Gram matrix of the input variables. We show that MLLR is a special case of KRR when a linear kernel is employed. Furthermore, we study an efficient low-rank approximation to the kernel matrix termed "rectangle method", where the regressors are chosen to be a small set of clustered adaptation frames. Experiments conducted on the EARS database (English conversational telephone speech) indicate that KRR with a Gaussian RBF kernel outperforms standard regression class-based MLLR

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Hakan Erdogan,et al.  KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION , 2004 .

[3]  Peder A. Olsen,et al.  Feature adaptation using projection of Gaussian posteriors , 2005, INTERSPEECH.

[4]  Roger Hsiao,et al.  Improving eigenspace-based MLLR adaptation by kernel PCA , 2004, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Temporally varying model parameters for large vocabulary continuous speech recognition , 2005, INTERSPEECH.

[6]  Chin-Hui Lee,et al.  Maximum a posteriori linear regression for hidden Markov model adaptation , 1999, EUROSPEECH.

[7]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  George Saon,et al.  Feature space Gaussianization , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Philip C. Woodland,et al.  Speaker adaptation of HMMs using linear regression , 1994 .

[11]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[12]  Mukund Padmanabhan,et al.  A nonlinear unsupervised adaptation technique for speech recognition , 2000, INTERSPEECH.