A Comparison of Different Approaches to Automatic Speech Segmentation

We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The complete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a speech segmentation tool. We compare two automatic approaches to segmentation, based on the Viterbi and the Forward-Backward algorithm respectively. Further, we develop different techniques to cope with biases between automatic and manual segmentations. Experiments were performed to evaluate the generation of phonetic transcriptions as well as the different speech segmentation methods.