Automatic disfluency removal on recognized spontaneous speech - rapid adaptation to speaker-dependent disfluencies

In this paper, we investigate methods to adapt a system for disfluency removal to different data properties. A gradient descent algorithm for parameter optimization is presented which achieves 85.1% recall and 93.1% precision on the English Verbmobil corpus and 53.0% recall and 79.0% precision on the Mandarin Chinese CallHome corpus. This compares to the results produced with hand-optimization on the test set. Furthermore, we investigated the impact of cross-validation and training set selection on recognizer output. Finally, we examined speaker dependent disfluency production behavior and clustered training data accordingly in order to improve the overall system.