The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.
[1]
Bishnu S. Atal,et al.
A new model of LPC excitation for producing natural-sounding speech at low bit rates
,
1982,
ICASSP.
[2]
Lawrence R. Rabiner,et al.
A modified K-means clustering algorithm for use in isolated work recognition
,
1985,
IEEE Trans. Acoust. Speech Signal Process..
[3]
Biing-Hwang Juang,et al.
On the use of bandpass liftering in speech recognition
,
1987,
IEEE Trans. Acoust. Speech Signal Process..
[4]
F. Itakura,et al.
Minimum prediction residual principle applied to speech recognition
,
1975
.
[5]
S. Kay.
Noise compensation for autoregressive spectral estimates
,
1980
.
[6]
L. R. Rabiner,et al.
The effects of selected signal processing techniques on the performance of a filter-bank-based isolated word recognizer
,
1983,
The Bell System Technical Journal.
[7]
F. Itakura,et al.
Spectral smoothing technique in PARCOR speech analysis-synthesis
,
1978
.
[8]
Yariv Ephraim,et al.
A linear predictive front-end processor for speech recognition in noisy environments
,
1987,
ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.