Learning to Identify Fragmented Words in Spoken Discourse

Disfluent speech adds to the difficulty of processing spoken language utterances. In this paper we concentrate on identifying one disfluency phenomenon: fragmented words. Our data, from the Spoken Dutch Corpus, samples nearly 45,000 sentences of human discourse, ranging from spontaneous chat to media broadcasts. We classify each lexical item in a sentence either as a completely or an incompletely uttered, i.e. fragmented, word. The task is carried out both by the IB 1 and RIPPER machine learning algorithms, trained on a variety of features with an extensive optimization strategy. Our best classifier has a 74.9% F-score, which is a significant improvement over the baseline. We discuss why memory-based learning has more success than rule induction in correctly classifying fragmented words.

[1]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[2]  Nelleke Oostdijk,et al.  The Design of the Spoken Dutch Corpus , 2002 .

[3]  Walter Daelemans,et al.  Evaluation of Machine Learning Methods for Natural Language Processing Tasks , 2002, LREC.

[4]  C H Nakatani,et al.  A corpus-based study of repair cues in spontaneous speech. , 1994, The Journal of the Acoustical Society of America.

[5]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[6]  Elizabeth,et al.  Crosslinguistic Dis uency Modeling: A Comparative Analysis of Swedish and American English Human{Human and Human{Machine Dialogs , 1998 .

[7]  Peter A. Heeman,et al.  MODELING SPEECH REPAIRS AND INTONATIONAL PHRASING TO IMPROVE SPEECH RECOGNITION , 1999 .

[8]  Elizabeth Shriberg,et al.  Crosslinguistic disfluency modelling: a comparative analysis of Swedish and american English human-human and human-machine dialogues , 1998, ICSLP.

[9]  James F. Allen,et al.  Deyecting and Correcting Speech Repairs , 1994, ACL.

[10]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Sharon L. Oviatt,et al.  Predicting spoken disfluencies during human-computer interaction , 1995, Comput. Speech Lang..

[13]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[14]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[15]  Elmar Nöth,et al.  How to repair speech repairs in an end-to-end system , 2001, DiSS.

[16]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Processing of Multi-Party Meetings? Evidence from Predicting Punctuation, Disfluencies, and Overlapping Speech , 2003 .

[17]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.