Prior parameter transformation for unsupervised speaker adaptation

In a strictly Bayesian approach, prior parameters are assumed known, based on common or subjective knowledge. But a practical solution for maximum a posteriori adaptation methods is to adopt an empirical Bayesian approach, where the prior parameters are estimated directly from training speech data itself. So there is a problem of mismatches between training and testing conditions in the use of prior parameters. We proposed a prior parameter transformation (PPT) adaptation approach that transforms the prior parameters to be more representative of the new speaker. In this paper we extend it to unsupervised mode. For easily confused speech units, different transformation matrices are applied to make them distinct. Initial experiments show that the PPT algorithm can get much improvement for a small amount of adaptation data even in the unsupervised mode.

[1]  M. Degroot Optimal Statistical Decisions , 1970 .

[2]  Michael Picheny,et al.  Speaker clustering and transformation for speaker adaptation in speech recognition systems , 1998, IEEE Trans. Speech Audio Process..

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Jen-Tzung Chien,et al.  A hybrid algorithm for speaker adaptation using MAP transformation and adaptation , 1997 .

[5]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[6]  Chin-Hui Lee,et al.  A hybrid algorithm for speaker adaptation using MAP transformation and adaptation , 1997, IEEE Signal Processing Letters.

[7]  Guoqiang Li,et al.  Regression transformation of prior means for speaker adaptation , 1999, EUROSPEECH.

[8]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[9]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[10]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[11]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..