Telephone band LVCSR for hearing-impaired users

Large vocabulary automatic speech recognition might assist hearing impaired telephone users by displaying a transcription of the incoming side of the conversation, but the system would have to achieve su cient accuracy on conversationalstyle, telephone-bandwidth speech. We describe our development work toward such a system. This work comprised three phases: Experiments with clean data ltered to 200-3500Hz, experiments with real telephone data, and language model development. In the rst phase, the speaker independent error rate was reduced from 25% to 12% by using MLLT, increasing the number of cepstral components from 9 to 13, and increasing the number of Gaussians from 30,000 to 120,000. The resulting system, however, performed less well on actual telephony, producing an error rate of 28.4%. By additional adaptation and the use of an LDA and CDCN combination, the error rate was reduced to 19.1%. Speaker adaptation reduces the error rate to 10.96%. These results were obtained with read speech. To explore the language-model requirements in a more realistic situation, we collected some conversational speech with an arrangement in which one participant could not hear the conversation but only saw recognizer output on a screen. We found that a mixture of language models, one derived from the Switchboard corpus and the other from prepared texts, resulted in approximately 10% fewer errors than either model alone.

[1]  Bernard Mérialdo,et al.  A Dynamic Language Model for Speech Recognition , 1991, HLT.

[2]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[4]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard M. Stern,et al.  Efficient joint compensation of speech for the effects of additive noise and linear filtering , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.