In a previous work, Multi-Environment Model based LInear Normalization, MEMLIN, was presented and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs). In this algorithm, the probability of the clean model Gaussian, given the noisy model one and the noisy feature vector (cross-probability model) is a critical point. In the previous work the cross-model probability was approximated as time-independent. In this paper, a time-dependent estimation of the cross-probability model based on GMM is proposed. Some experiments with SpeechDat Car database were carried out in order to study the performance of the proposed estimation in a real acoustic environment. MEMLIN with time-independent cross-probability model reached 70.21 % of mean improvement in Word Error Rate (WER), however, when time-dependent cross-probability model based on GMM was applied, the mean improvement in WER went up to 78.47 %.
[1]
Li Deng,et al.
Evaluation of the SPLICE algorithm on the Aurora2 database
,
2001,
INTERSPEECH.
[2]
Gaël Richard,et al.
The speechdat-car multilingual speech databases for in-car applications: some first validation results
,
1999,
EUROSPEECH.
[3]
Mitch Weintraub,et al.
Robust speech recognition in noise using adaptation and mapping techniques
,
1995,
1995 International Conference on Acoustics, Speech, and Signal Processing.
[4]
Richard M. Stern,et al.
COMPENSATION FOR ENVIRONMENTAL DEGRADATION IN AUTOMATIC SPEECH RECOGNITION
,
1999
.
[5]
Chin-Hui Lee,et al.
A maximum-likelihood approach to stochastic matching for robust speech recognition
,
1996,
IEEE Trans. Speech Audio Process..