Laughter extracted from television closed captions as speech recognizer training data

Closed captions in television broadcasts, intended to aid the hearing impaired, also have potential as training data for speech-recognition software. Use of closed captions for automatic extraction of virtually unlimited training data has already been demonstrated [1]. This paper reports some preliminary work on the use of non-speech sound tokens included in closed captions to extract training data to augment a speech recognizer’s repertoire of non-speech phonemes. A small experiment was performed to pinpoint laughter sounds in television news broadcasts using the Informedia Digital Video Library’s retrieval capabilities, which automatically exploit closed captions. The snippets found were used to retrain a speech recognizer. A small test showed a small but significant gain in performance. In the future we plan to develop this approach into a fully automatic procedure for extracting training data for non-speech sounds.