Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain

A novel approach for automatic speech recognition in highly reverberant environments, proposed in [1] for isolated word recognition, is extended to continuous speech recognition (CSR) in this paper. The approach is based on a combined acoustic model consisting of a network of clean speech HMMs and a reverberation model. Because the grammatical information and the information about the acoustic environment are strictly separated in the combined model, a high degree of flexibility for adapting the system to new tasks and new environments is attained. We show that virtually all known CSR search algorithms can be used for decoding the proposed combined model if a few extensions are added. In a simulation of a connected digit recognition task, the proposed method achieves more than 40 % reduction of the word error rate compared to a conventional HMM-based system trained on reverberant speech, at the cost of an increased decoding complexity.

[1]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[2]  Walter Kellermann,et al.  Hands-free speech recognition using a reverberation model in the feature domain , 2006, 2006 14th European Signal Processing Conference.

[3]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Shigeki Sagayama,et al.  Model adaptation by state splitting of HMM for long reverberation , 2005, INTERSPEECH.

[5]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[6]  Alexander Fischer,et al.  Acoustic synthesis of training data for speech recognition in living room environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Maurizio Omologo,et al.  Training of HMM with filtered speech material for hands-free recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[9]  Mark J. F. Gales,et al.  Improving environmental robustness in large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .