This paper proposes an unsupervised noisy environment adaptation algorithm based on the HMM acoustic model, using MLLR and a multispeaker database. An arbitrary single sentence uttered by the target speaker, together with living room noise, is used as the input, and the data with superposed noise for environment adaptation are generated from the speech database. By the above procedure, a large amount of adaptation data can be acquired without burdening the speaker. Specifically, the adaptation procedure is composed of the following three stages. (1) Speaker identification by GMM is used to select speakers with short acoustic distances from the input speaker from the database. (2) Utterances read out by the selected speakers are extracted from the database, and living room noise is superposed. (3) Using the constructed speech with superimposed noise as adaptation samples, adaptation by MLLR is performed. Then, by combining the method with unsupervised speaker adaptation based on sufficient statistics and the speaker distance, and HMM synthesis, a highly precise unsupervised integrated adaptation system is constructed. The system is evaluated through large-vocabulary continuous speech recognition, and it is shown that the adaptation model by the proposed method can realize the same or better recognition accuracy than the environment matched model, and realize performance close to supervised MLLR using several tens of samples. In a noisy environment with an SNR of 20 dB, the proposed adaptation system improved the recognition rate from 48.3% to 70.5% in the monophone model, and from 60.1% to 79.9% in the PTM model. © 2005 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 89(3): 48–58, 2006; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20227
[1]
Michael Picheny,et al.
Speaker adaptation based on pre-clustering training speakers
,
1997,
EUROSPEECH.
[2]
Kiyohiro Shikano,et al.
Unsupervised speaker adaptation based on sufficient HMM statistics of selected speakers
,
2001,
2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[3]
Philip C. Woodland,et al.
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
,
1995,
Comput. Speech Lang..
[4]
Mark J. F. Gales,et al.
Iterative unsupervised adaptation using maximum likelihood linear regression
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[5]
Michael Picheny,et al.
Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
,
1996,
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.