It is well-known that additive and channel noise cause shift and scaling in MFCC features. Empirical normalization techniques to estimate and compensate for the effects, such as cepstral mean subtraction and variance normalization, have been shown to be useful. However, these empirical estimate may not be optimal. In this paper, we approach the problem from two directions, 1) use a more robust MFCC-based features that is less sensitive to additive and channel noise and 2) propose a maximum likelihood (ML) based approach to compensate the noise effect. In addition, we proposed the use of multi-class normalization in which different normalization factors can be applied to different phonetic units. The combination of the robust features and ML normalization is particularly useful for highly mis-matched condition in the Aurora 3 corpus resulting in a 15.8% relative improvement in the highly mis-matched case and a 10.4% relative improvement on average over the three conditions.
[1]
Olli Viikki,et al.
A recursive feature vector normalization approach for robust speech recognition in noise
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[2]
Misha Pavel,et al.
On the importance of various modulation frequencies for speech recognition
,
1997,
EUROSPEECH.
[3]
Denis Jouvet,et al.
Evaluation of a noise-robust DSR front-end on Aurora databases
,
2002,
INTERSPEECH.
[4]
Vassilios Digalakis,et al.
Speaker adaptation using constrained estimation of Gaussian mixtures
,
1995,
IEEE Trans. Speech Audio Process..
[5]
Hynek Hermansky,et al.
RASTA processing of speech
,
1994,
IEEE Trans. Speech Audio Process..