Recognizing reverberant speech with RASTA-PLP

The performance of the PLP (perceptual linear predictive), log-RASTA-PLP, and J-RASTA-PLP front ends for recognition of highly reverberant speech is measured and compared with the performance of humans and the performance of an experimental RASTA-like front end on reverberant speech, and with the performance of a PLP-based recognizer trained on reverberant speech. While humans are able to reliably recognize the reverberant test set, achieving a 6.1% word error rate, the best RASTA-PLP-based recognizer has a word error rate of 68.7% on the same test set, and the PLP-based recognizer trained on reverberant speech has a 50.3% word error rate. Our experimental variant on RASTA processing provides a statistically significant improvement in performance on the reverberant speech, with a best word error rate of 64.1%.

[1]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[2]  S. Gelfand,et al.  Effects of small room reverberation upon the recognition of some consonant features , 1979 .

[3]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[4]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[5]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[6]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[7]  Oded Ghitza,et al.  A comparative study of mel cepstra and EIH for phone classification under adverse conditions , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.