A prosody only decision-tree model for disfluency detection

Speech disfluencies (filled pauses, repetitions, repairs, a nd false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for eff ective natural language understanding, as well as to improve speech models in general. Previous approaches to disfluency detect ion have relied heavily on lexical information, which makes them less applicable when word recognition is unreliable. We have developed a disfluency detection method using decision tree classifiers that use only local and automatically extracted pros odic features. Because the model doesn’t rely on lexical informa tion, it is widely applicable even when word recognition is unreliable. The model performed significantly better than chance a t detecting four disfluency types. It also outperformed a lang uage model in the detection of false starts, given the correct tra nscription. Combining the prosody model with a specialized language model improved accuracy over either model alone for the detection of false starts. Results suggest that a prosody-only mo del can aid the automatic detection of disfluencies in spontaneo us speech.

[1]  James F. Allen,et al.  Deyecting and Correcting Speech Repairs , 1994, ACL.

[2]  Douglas D. O'Shaughnessy Correcting complex false starts in spontaneous speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Andreas Stolcke,et al.  Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Andreas Stolcke,et al.  Word predictability after hesitations: a corpus-based study , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Robbert-Jan Beun,et al.  Filled pauses as markers of discourse structure , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  C H Nakatani,et al.  A corpus-based study of repair cues in spontaneous speech. , 1994, The Journal of the Acoustical Society of America.

[8]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[10]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.