Unsupervised adaptation using structural Bayes approach

It is well-known that the performance of recognition systems is often largely degraded when there is a mismatch between the training and testing environment. It is desirable to compensate for the mismatch when the system is in operation without any supervised learning. Previously, a structural maximum a posteriori (SMAP) adaptation approach, in which a hierarchical structure in the parameter space is assumed, was proposed. In this paper, this SMAP method is applied to unsupervised adaptation. A novel normalization technique is also introduced as a front end for the adaptation process. The recognition results showed that the proposed method was effective even when only one utterance from a new speaker was used for adaptation. Furthermore, an effective way to combine the supervised adaptation and the unsupervised adaptation was investigated to reduce the need for a large amount of supervised learning data.

[1]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1996, IEEE Trans. Speech Audio Process..

[2]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[3]  Koichi Shinoda,et al.  Speech recognition using tree-structured probability density function , 1994, ICSLP.

[4]  Chin-Hui Lee,et al.  A study on speaker adaptation of continuous density HMM parameters , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for large vocabulary continuous speech recognition , 1992 .

[6]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[7]  Chin-Hui Lee,et al.  Matching for Robust Speech Rec , 1996 .

[8]  Koichi Shinoda,et al.  High speed speech recognition using tree-structured probability density function , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas B. Paul Extensions to phone-state decision-tree clustering: single tree and tagged clustering , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[11]  Koichi Shinoda,et al.  Structural MAP speaker adaptation using hierarchical priors , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[12]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.