MODELING SPEECH REPAIRS AND INTONATIONAL PHRASING TO IMPROVE SPEECH RECOGNITION

The spontaneous speech events of speech repairs and intonational phrasing cause disruptions in the local context, and this disruption prevents traditional language models from being able to properly predict the words in the vicinity of these events. The solution is to use a language model that can account for these spontaneous speech events. In this paper, we use such a model to rescore word graphs. This gives a small but significant decrease in the word error rate of 1.2%, in addition to an improvement of 4.4% from modeling the syntactic role of the words. Furthermore, as modeling of spontaneous speech events improves, word recognition results should also improve.

[1]  James G. Martin,et al.  The perception of hesitation in spontaneous speech , 1968 .

[2]  Elmar Nöth,et al.  Dialog act classification with the help of prosody , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Cheryl M. Beach,et al.  The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations☆ , 1991 .

[4]  Rukmini Iyer,et al.  Modeling Conversational Speech for Speech Recognition , 1996, EMNLP.

[5]  W. Levelt,et al.  Monitoring and self-repair in speech , 1983, Cognition.

[6]  James F. Allen,et al.  Speech repains, intonational phrases, and discourse markers: modeling speakers’ utterances in spoken dialogue , 1999, CL.

[7]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[8]  David R. Traum,et al.  Utterance Units in Spoken Dialogue , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[9]  Gökhan Tür,et al.  Modeling the prosody of hidden events for improved word recognition , 1999, EUROSPEECH.

[10]  Peter A. Heeman,et al.  POS Tags and Decision Trees for Language Modeling , 1999, EMNLP.

[11]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Yonghong Yan,et al.  Development of the 1998 OGI-FONIX broadcast news transcription system , 1999, EUROSPEECH.

[13]  Robin J. Lickley,et al.  On not remembering disfluencies , 1997, EUROSPEECH.

[14]  Mari Ostendorf,et al.  Parse scoring with prosodic information: an analysis/synthesis approach , 1993, Comput. Speech Lang..