Unsupervised Prosodic Break Detection in Mandarin Speech

We propose that, in Mandarin speech, an automatic prosodic break detector can be trained without any prosodically labeled training data. We use only lexical and acoustic cues to create a small labeled training set, then use semi-supervised learning to train a prosodic break detector. A generative mixture model is proposed as the learning algorithm that can learn with both labeled and unlabeled data. The experiments in both English and Mandarin corpus verify our algorithm.