Disfluency Detection Using Multi-step Stacked Learning

In this paper, we propose a multi-step stacked learning model for disfluency detection. Our method incorporates refined n-gram features step by step from different word sequences. First, we detect filler words. Second, edited words are detected using n-gram features extracted from both the original text and filler filtered text. In the third step, additional n-gram features are extracted from edit removed texts together with our newly induced in-between features to improve edited word detection. We use Max-Margin Markov Networks (M 3 Ns) as the classifier with the weighted hamming loss to balance precision and recall. Experiments on the Switchboard corpus show that the refined n-gram features from multiple steps and M 3 Ns with weighted hamming loss can significantly improve the performance. Our method for disfluency detection achieves the best reported F-score 0.841 without the use of additional resources. 1

[1]  William W. Cohen,et al.  Stacked Sequential Learning , 2005, IJCAI.

[2]  Qi Zhang,et al.  A Progressive Feature Selection Algorithm for Ultra Large Feature Spaces , 2006, ACL.

[3]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[5]  Kallirroi Georgila Using Integer Linear Programming for Detecting Speech Disfluencies , 2009, HLT-NAACL.

[6]  Matthew Lease,et al.  Effective Use of Prosody in Parsing Conversational Speech , 2005, HLT.

[7]  Eugene Charniak,et al.  A TAG-based noisy-channel model of speech repairs , 2004, ACL.

[8]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[9]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[10]  Mark Johnson,et al.  The impact of language models and loss functions on repair disfluency detection , 2011, ACL.

[11]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[12]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[13]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.