Using Suprasegmental Information in Recognized Speech Punctuation Completion

We propose a scheme to determine punctuation of the text produced by an automatic speech recognizer. We deal with the addition of commas based on the recognized text and we propose a full stop detection scheme using both – the textual and prosody information. We also propose an expanded scheme which utilizes enriched audio document information (e.g. speaker diarization, language detection etc.) to improve the sentence boundary detection. We compare the above mentioned schemes and its accuracy in terms of (in)correctly estimated punctuation markers and its ability to mark the positions of sentence boundaries. Hence we want to show it is better to incorporate all the relevant information sources in one reasonable scheme than to split the document processing into independent layers. Proposed schemes are evaluated over a set of recordings from the Czech (and Czechoslovak) radio broadcasts.