A corpus-based study of repair cues in spontaneous speech.

The occurrence of disfluencies in fully natural speech poses difficult challenges for spoken language understanding systems. For example, although self-repairs occur in about 10% of spontaneous utterances, they are often unmodeled in speech recognition systems. This is partly due to the fact that little is known about the extent to which cues in the speech signal may facilitate automatic repair processing. In this paper, acoustic and prosodic cues to self-repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair correction. The relative contributions of these speech-based cues, as well as other text-based repair cues, are examined in a statistical model of repair site detection that achieves a precision rate of 91% and recall of 86% on a prosodically labeled corpus of repair utterances.

[1]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[2]  W. Cooper,et al.  Fundamental frequency contours at syntactic boundaries. , 1977, The Journal of the Acoustical Society of America.

[3]  Michael Riley,et al.  Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[4]  Julia Hirschberg,et al.  Automatic classification of intonational phrase boundaries , 1992 .

[5]  John Bear,et al.  Automatic Detection and Correction of Repairs in Human-Computer Dialog , 1992, HLT.

[6]  Alan W. Biermann,et al.  The Correction of Ill-Formed Input Using History-Based Expectation with Applications to Speech Understanding , 1986, Comput. Linguistics.

[7]  Donald Hindle,et al.  Deterministic Parsing of Syntactic Non-fluencies , 1983, ACL.

[8]  Joseph P. Olive,et al.  Acoustics of American English speech , 1993 .

[9]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[10]  S. G. Nooteboom,et al.  Speaking and unspeaking : detection and correction of phonological and lexical errors in spontaneous speech , 1980 .

[11]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Douglas D. O'Shaughnessy Analysis of false starts in spontaneous speech , 1992, ICSLP.

[13]  B. A. Engel,et al.  INTEGRATING MULTIPLE KNOWLEDGE SOURCES , 1990 .

[14]  P Howell,et al.  The Use of Prosody in Highlighting Alterations in Repairs from Unrestricted Speech , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[15]  Julia Hirschberg,et al.  A speech-first model for repair identification in spoken language systems , 1993, EUROSPEECH.

[16]  Robin J. Lickley,et al.  Processing disfluent speech: recognising disfluency before lexical access , 1992, ICSLP.

[17]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.

[18]  Anne Cutler,et al.  Prosodic marking in speech repair , 1983 .

[19]  Jaime G. Carbonell,et al.  Recovery Strategies for Parsing Extragrammatical Language , 1983, CL.

[20]  Elizabeth R. Blacfkmer,et al.  Theories of monitoring and the timing of repairs in spontaneous speech , 1991, Cognition.

[21]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[22]  John Local,et al.  Projection and ‘silences’: Notes on phonetic and conversational structure , 1986 .

[23]  Elisabeth Selkirk,et al.  Phonology and Syntax: The Relation between Sound and Structure , 1984 .

[24]  John Coleman,et al.  Acoustics of American English speech : a dynamic approach , 1993 .

[25]  W. Levelt,et al.  Monitoring and self-repair in speech , 1983, Cognition.

[26]  Andrej Ljolje,et al.  Optimal speech recognition using phone recognition and lexical access , 1992, ICSLP.

[27]  Norman K. Sondheimer,et al.  Meta-Rules as a Basis for Processing III-Formed Input , 1983, Am. J. Comput. Linguistics.

[28]  Lynette Hirschman,et al.  Multi-Site Data Collection for a Spoken Language Corpus , 1992, HLT.