Adaptation method based on HMM composition and EM algorithm

A method for adapting HMMs to additive noise and multiplicative distortion at the same time is proposed. This method first creates a noise HMM for additive noise, then composes HMMs for noisy and distorted speech data from this HMM and speech HMMs so that these composed HMMs become the functions of signal-to-noise (S/N) ratio and multiplicative distortion. S/N ratio and multiplicative distortion are estimated by maximizing the likelihood of the HMMs to the input speech. To achieve this, we propose a new method that divides the maximization process into estimation of S/N ratio and estimation of cepstrum bias. The S/N ratio is estimated using the parallel model method. The cepstrum bias is estimated using the EM algorithm. To evaluate this method, two experiments in terms of phoneme recognition and connected digit recognition are performed. The guarantee of convergence of this algorithm is also discussed.

[1]  Chin-Hui Lee,et al.  Robust speech recognition based on stochastic matching , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Sadaoki Furui,et al.  A maximum likelihood procedure for a universal adaptation method based on HMM composition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[6]  Biing-Hwang Juang,et al.  Signal bias removal for robust telephone based speech recognition in adverse environments , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.