A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise

The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.