Under noisy conditions, due to the redundancy of speech signal, there are some spectral bands (Reliable Bands) whose local SNR’s are high enough to be used effectively by a recognizer. A novel, phonetically motivated Reliable Bands Guided similarity measure (RBG measure) is proposed in this study. It has the following features. Firstly, for reference spectrum, frequency bands which have larger absolute energy or sharper spectral peaks are marked as reliable bands. They are to be given more weight than the other bands in the definition of the RBG measure. Secondly, within each reliable band, similarity between formant positions and formant shapes of test spectrum and reference spectrum is explicitly modelled. Lastly, the measure can automatically emphasize spectral bands whose amplitudes change abruptly, which normally contain more reliable dynamic features of the speech signal. Both the RBG measure and the Parallel Model Combination (PMC) method are tested on a speaker-independent, continuous Mandarin digit string recognition task, under 15 noisy conditions. Noises are drawn from the NOISEX92 database. The RBG measure shows an average 4.22% word accuracy score below the PMC method above 0 dB. However, it outperforms the PMC method by 8.82% at 0 dB. More importantly,the RBG measure does not rely on accurate background noise modeling, which is a difficult task in itself.
[1]
Ronald L. Wasserstein,et al.
Monte Carlo: Concepts, Algorithms, and Applications
,
1997
.
[2]
Mark J. F. Gales.
Predictive model-based compensation schemes for robust speech recognition
,
1998,
Speech Commun..
[3]
Jack Dongarra,et al.
LAPACK Users' Guide, 3rd ed.
,
1999
.
[4]
David Burshtein.
Robust parametric modeling of durations in hidden Markov models
,
1996,
IEEE Trans. Speech Audio Process..
[5]
Jean-Claude Junqua,et al.
Robustness in Automatic Speech Recognition: Fundamentals and Applications
,
1995
.