Handling Disfluencies in Spontaneous Language Models

In automatic speech recognition, a stochastic language model (LM) predicts the probability of the next word on the basis of previously recognized words. For the recognition of dictated speech this method works reasonably well since sentences are typically well-formed and reliable estimation of the probabilities is possible on the basis of large amounts of written text material. However, for spontaneous speech the situation is quite different: disfluencies distort the normal flow of sentences and written transcripts of spontaneous speech are too scarce to train good stochastic LMs. Both factors contribute to the poor performance of automatic speech recognizers on spontaneous input. In this paper we investigate how one specific approach to disfluencies in spontaneous language modeling influences recognition performance.

[1]  José Rouillard,et al.  Internet Documents: A Rich Source for Spoken Language Modeling , 1999 .

[2]  Elizabeth Shriberg DISFLUENCIES IN SWITCHBOARD , 1996 .

[3]  Alexander H. Waibel,et al.  Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition , 1998, ACL.

[4]  Andreas Wendemuth,et al.  The philips/RWTH system for transcription of broadcast news , 1999, EUROSPEECH.

[5]  Alex Waibel,et al.  New developments in automatic meeting transcription , 2000, INTERSPEECH.

[6]  Jean-Luc Gauvain,et al.  Language modeling for broadcast news transcription , 1999, EUROSPEECH.

[7]  George Zavaliagkos,et al.  Bi-modal sentence structure for language modeling , 2000, Speech Commun..

[8]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[9]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[10]  Andreas Stolcke,et al.  Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system , 2002, INTERSPEECH.

[11]  Jean-Luc Gauvain,et al.  Recent advances in transcribing television and radio broadcasts , 1999, EUROSPEECH.

[12]  Andreas Stolcke,et al.  Word predictability after hesitations: a corpus-based study , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[15]  Andreas Stolcke,et al.  Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Mari Ostendorf,et al.  Modeling disfluencies in conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.