Maximum Penalized Likelihood Kernel Regression for Fast Adaptation

This paper proposes a nonlinear generalization of the popular maximum-likelihood linear regression (MLLR) adaptation algorithm using kernel methods. The proposed method, called maximum penalized likelihood kernel regression adaptation (MPLKR), applies kernel regression with appropriate regularization to determine the affine model transform in a kernel-induced high-dimensional feature space. Although this is not the first attempt of applying kernel methods to conventional linear adaptation algorithms, unlike most of other kernelized adaptation methods such as kernel eigenvoice or kernel eigen-MLLR, MPLKR has the advantage that it is a convex optimization and its solution is always guaranteed to be globally optimal. In fact, the adapted Gaussian means can be obtained analytically by simply solving a system of linear equations. From the Bayesian perspective, MPLKR can also be considered as the kernel version of maximum a posteriori linear regression (MAPLR) adaptation. Supervised and unsupervised speaker adaptation using MPLKR were evaluated on the Resource Management and Wall Street Journal 5K tasks, respectively, achieving a word error rate reduction of 23.6% and 15.5% respectively over the speaker-independently model.

[1]  Roger Hsiao,et al.  Kernel Eigenspace-Based MLLR Adaptation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Timothy J. Hazen A comparison of novel techniques for rapid speaker adaptation , 2000, Speech Commun..

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Roger Hsiao,et al.  Improving Reference Speaker Weighting Adaptation by the Use of Maximum-Likelihood Reference Speakers , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  Chin-Hui Lee,et al.  Unsupervised adaptation using structural Bayes approach , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[11]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[12]  Ivor W. Tsang,et al.  Fast Speaker Adaption Via Maximum Penalized Likelihood Kernel Regression , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  James T. Kwok,et al.  Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  William J. Byrne,et al.  Discounted likelihood linear regression for rapid speaker adaptation , 2001, Comput. Speech Lang..

[15]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[16]  Lin-Shan Lee,et al.  Fast speaker adaptation using eigenspace-based maximum likelihood linear regression , 2000, INTERSPEECH.

[17]  Chin-Hui Lee,et al.  Maximum a posteriori linear regression for hidden Markov model adaptation , 1999, EUROSPEECH.

[18]  George Saon,et al.  A Non-Linear Speaker Adaptation Technique using Kernel Ridge Regression , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[20]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[21]  Andrew John Hewett,et al.  Training and speaker adaptation in template-based speech recognition , 1989 .

[22]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[23]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[24]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[25]  Michael Picheny,et al.  Robust speaker adaptation using a piecewise linear acoustic mapping , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  James T. Kwok,et al.  Kernel eigenvoice speaker adaptation , 2005, IEEE Transactions on Speech and Audio Processing.

[27]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[28]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[29]  Tetsuo Kosaka,et al.  Speaker-independent speech recognition based on tree-structured speaker clustering , 1996, Comput. Speech Lang..

[30]  Michael Picheny,et al.  Rapid adaptation using penalized-likelihood methods , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).