Structural maximum a posteriori linear regression for fast HMM adaptation

Transformation-based model adaptation techniques have been used for many years to improve robustness of speech recognition systems. While the estimation criterion used to estimate transformation parameters has been mainly based on maximum likelihood estimation (MLE), Bayesian versions of some of the most popular transformation-based adaptation methods have been recently introduced, like MAPLR, a maximum a posteriori(MAP) based version of the well-known maximum likelihood linear regression (MLLR) algorithm. This is in fact an attempt to constraint parameter estimation in order to obtain reliable estimation with a limited amount of data, not only to prevent overfitting the adaptation data but also to allow integration of prior knowledge into transformation-based adaptation techniques. Since such techniques require the estimation of a large number of transformation parameters when the amount of adaptation data is large, it is also required to define a large number of prior densities for these parameters. Robust estimation of these prior densities is therefore a crucial issue that directly affects the efficiency and effectiveness of the Bayesian techniques. This paper proposes to estimate these priors using the notion of hierarchical priors, embedded into the tree structure used to control transformation complexity. The proposed algorithm, called structural MAPLR (SMAPLR), has been evaluated on the Spoke3 1993 test set of the WSJ task. It is shown that SMAPLR reduces the risk of overtraining and exploits the adaptation data much more efficiently than MLLR, leading to a significant reduction of the word error rate for any amount of adaptation data.

[1]  Chin-Hui Lee,et al.  A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..

[2]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[3]  Philip C. Woodland,et al.  Flexible speaker adaptation for large vocabulary speech recognition , 1995, EUROSPEECH.

[4]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[5]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1996, IEEE Trans. Speech Audio Process..

[6]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Chin-Hui Lee,et al.  On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[9]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[10]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[11]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[12]  武田 一哉,et al.  Workshop on Robust Methods for Speech Recognition in Adverse Conditions報告 , 1999 .

[13]  Chin-Hui Lee,et al.  Joint maximum a posteriori adaptation of transformation and HMM parameters , 2001, IEEE Trans. Speech Audio Process..

[14]  Chin-Hui Lee,et al.  Maximum a posteriori linear regression for hidden Markov model adaptation , 1999, EUROSPEECH.

[15]  Chin-Hui Lee,et al.  A study on speaker adaptation of continuous density HMM parameters , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Qiang Huo,et al.  On adaptive decision rules and decision parameter adaptation for automatic speech recognition , 2000, Proceedings of the IEEE.

[17]  Chin-Hui Lee,et al.  Unsupervised adaptation using structural Bayes approach , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Jen-Tzung Chien,et al.  A hybrid algorithm for speaker adaptation using MAP transformation and adaptation , 1997 .

[19]  Wu Chou,et al.  Robust decision tree state tying for continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..

[20]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[21]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[22]  Koichi Shinoda,et al.  Structural MAP speaker adaptation using hierarchical priors , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[23]  Wu Chou,et al.  Decision tree state tying based on segmental clustering for acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Vassilios Diakoloukas,et al.  Maximum-likelihood stochastic-transformation adaptation of hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[25]  Bin Ma,et al.  Online adaptive learning of continuous-density hidden Markov models based on multiple-stream prior evolution and posterior pooling , 2001, IEEE Trans. Speech Audio Process..

[26]  Chin-Hui Lee,et al.  HIDDEN MARKOV MODEL ADAPTATION USING MAXIMUM A POSTERIORI LINEAR REGRESSION , 1999 .

[27]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[28]  Arjun K. Gupta,et al.  Elliptically contoured models in statistics , 1993 .

[29]  Jen-Tzung Chien,et al.  Improved Bayesian learning of hidden Markov models for speaker adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[31]  Philip C. Woodland,et al.  Speaker adaptation of HMMs using linear regression , 1994 .