Decoding-time prediction of non-verbalized punctuation

This paper presents novel methods that integrate lexical prediction of non-verbalized punctuations with Viterbi decoding for Large Vocabulary Conversational Speech Recognition (LVCSR) in a single pass. We describe two different approaches one based on a modified finite state machine representation of language models and one based on an extension of an LVCSR decoder. We discuss advantages over traditional punctuation prediction approaches based on post-processing of recognition hypotheses, including experimental evaluation of the proposed approach using a state-of-the-art LVCSR decoder. Experiments were performed on a medical documentation corpus and results demonstrate that the proposed methods yield improved punctuation prediction accuracy while at the same time reducing system complexity and memory requirements.

[1]  Detlef Koll,et al.  Modeling and efficient decoding of large vocabulary conversational speech , 1999, EUROSPEECH.

[2]  Bhuvana Ramabhadran,et al.  The IBM 2007 speech transcription system for European parliamentary speeches , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[3]  Andreas Stolcke,et al.  Combining Words and Speech Prosody for Automatic Topic Segmentation , 2007 .

[4]  Yoshihiko Gotoh,et al.  Sentence Boundary Detection in Broadcast Speech Transcripts , 2000 .

[5]  John D. Lafferty,et al.  Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Julia Hirschberg,et al.  Acoustic indicators of topic segmentation , 1998, ICSLP.