论文信息 - The harming part of room acoustics in automatic speech recognition

The harming part of room acoustics in automatic speech recognition

Automatic speech recognition (ASR) systems used in real indoor scenarios suffer from different noise and reverberation conditions compared to the training conditions. This article describes a study which aims to find out what are the most harming parts of reverberation to speech recognition. Noise influences are left out. Therefore different real room impulse responses in different rooms and different speaker to microphone distances are measured and modified. The results of the recognition experiments with the related convoluted impulse responses clearly show the dependency of early and late as well as high and low frequency reflections. Conclusions concerning the design of a dereverberation method are made.

[1] Rüdiger Hoffmann,et al. A unified approach for speech synthesis and speech recognition using stochastic Markov graphs , 2000, INTERSPEECH.

[2] Tomohiro Nakatani,et al. Blind dereverberation of single channel speech signal based on harmonic structure , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3] Walter Kellermann,et al. Hands-free speech recognition using a reverberation model in the feature domain , 2006, 2006 14th European Signal Processing Conference.

[4] H. Bass,et al. Atmospheric Absorption of Sound: Analytical Expressions , 1972 .

[5] Diane Hirschfeld,et al. Towards an evaluation standard for speech control concepts in real-world scenarios , 2003, INTERSPEECH.