Adding robustness to language models for spontaneous speech recognition

Compared to dictation systems, recognition systems for spontaneous speech still perform rather poorly. An important weakness in these systems is the statistical language model, mainly due to the lack of large amounts of stylistically matching training data and to the occurrence of disfluencies in the recognition input. In this paper we investigate a method for improving the robustness of a spontaneous language model by flexible manipulation of the prediction context when disfluencies occur. In the case of repetitions, we obtained significantly better recognition results on a benchmark Switchboard test set.

[1]  Elizabeth Shriberg DISFLUENCIES IN SWITCHBOARD , 1996 .

[2]  Alexander H. Waibel,et al.  Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition , 1998, ACL.

[3]  Andreas Stolcke,et al.  Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Andreas Wendemuth,et al.  The philips/RWTH system for transcription of broadcast news , 1999, EUROSPEECH.

[5]  Mari Ostendorf,et al.  Modeling disfluencies in conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Alex Waibel,et al.  New developments in automatic meeting transcription , 2000, INTERSPEECH.

[7]  George Zavaliagkos,et al.  Bi-modal sentence structure for language modeling , 2000, Speech Commun..

[8]  Patrick Wambacq,et al.  Handling Disfluencies in Spontaneous Language Models , 2002, CLIN.

[9]  Jean-Luc Gauvain,et al.  Language modeling for broadcast news transcription , 1999, EUROSPEECH.

[10]  Andreas Stolcke,et al.  Word predictability after hesitations: a corpus-based study , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Jean-Luc Gauvain,et al.  Recent advances in transcribing television and radio broadcasts , 1999, EUROSPEECH.