A projection-based likelihood measure for speech recognition in noise

Investigates a projection-based likelihood measure that significantly improves automatic speech recognition performance in the presence of additive broadband noise. The measure was developed by modifying likelihood scores in continuous Gaussian density hidden Markov models (HMMs), resulting in the weighted projection measure (WPM). Experimental results using the proposed measure are reported for several performance factors: different cepstral-based parameters, normal and multistyle speech, and various noise signals, including white, jittering white, and broadband colored noise. In all cases, significant improvements in speaker-dependent, isolated word recognition were achieved using the WPM instead of the standard Gaussian likelihood measure (weighted Euclidean distance (WED)). As an example, at a SNR of 5 dB, the WPM resulted in improvement in recognition accuracy from 19.4 to 80.6% compared with the standard WED for the DFT mel-cepstral representation. >

[1]  Brian A. Hanson,et al.  Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Man Mohan Sondhi,et al.  A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kuldip K. Paliwal Estimation of noise variance from the noisy AR signal and its application in speech enhancement , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[7]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[8]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[9]  Yariv Ephraim,et al.  A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.