Detecting disfluency in spontaneous speech

This thesis reports a study of the perception of disfhiency in spontaneous, con¬ versational speech. Disfluent speech presents problems for both computational and psycholinguistic models of speech processing. The surface strings produced when speech is interrupted by disfluency require complex editing processes from computational models in order to produce well-formed strings for parsers. There is little em¬ pirical evidence about how the human speech processing mechanism deals with disfluencies, but our everyday experience of listening to speech suggests that we can deal with disfluencies very smoothly and efficiently. One of the first prob¬ lems for a speech processor is to detect that disfluency has occurred. No reliable acoustic or prosodic cues have been identified which signal the presence of a discontinuity. In this thesis, the main aims are address this problem by first es¬ tablishing detection points for a set of disfluent utterances and then finding out what acoustic and prosodic cues are available at these points. The main part of the study consists of a series of 5 perceptual experiments, followed by acoustic and prosoodic analyses. The first 3 experiments establish detection points for disfluencies and relate these points to recognition points of the words in the vicinity of the interruption. The last 2 experiments examine the role of prosodic information in detecting disfluency., first over whole utterances and then focussing in on the region of the interruption. The acoustic and prosodic analyses of the experimental stimuli match responses indicating disfluency detec¬ tion to events in the speech signal which might act as cues. The results of the first 3 experiments show that disfluency can be recognised very early, usually within the first word after the interruption point. Importantly, it is also shown that the detection of disfluency can be achieved before the word is recognised non-syntactic information is used. The last two experiments confirm that prosodic information can be used to distinguish fluent from disfluent utter¬ ances. The acoustic and prosodic analyses suggest that a combination of cues can be of use. In the absence of "ungrammatical pause" and broken-off words, a break in the signal is signalled by the absence of phonological linking between the words on either side of the interruption. It may be possible to identify other cues in future studies with larger data sets.

[1]  Joakim Nivre,et al.  Speech Management—on the Non-written Life of Speech , 1990, Nordic Journal of Linguistics.

[2]  R. Ingham,et al.  Time-interval measurement of stuttering: modifying interjudge agreement. , 1993, Journal of speech and hearing research.

[3]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.

[4]  James G. Martin,et al.  Reaction Time to Phoneme Targets as a Function of Rhythmic Cues in Continuous Speech. , 1974 .

[5]  Daniel Luzzati,et al.  ALORS: a skimming parser for spontaneous speech processing , 1987 .

[6]  R. Shillcock,et al.  The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context , 1988, Perception & psychophysics.

[7]  Barbara A. Fox Analysing Conversation: Rules and Units in the Structure of Talk , 1988 .

[8]  W. Perkins,et al.  Validity and reliability of judgments of authentic and simulated stuttering. , 1990, The Journal of speech and hearing disorders.

[9]  Gerald T.M. Altmann,et al.  Reference and the resolution of local syntactic ambiguity : the effect of context during human sentence processing , 1986 .

[10]  D. O'Shaughnessy,et al.  Recognition of hesitations in spontaneous speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  B Blesser,et al.  Speech perception under conditions of spectral transformation. I. Phonetic characteristics. , 1972, Journal of speech and hearing research.

[12]  Roger Lass,et al.  Phonology: An Introduction to Basic Concepts , 1984 .

[13]  Merrilyn L. Gow,et al.  On valid and reliable identification of normal disfluencies and stuttering disfluencies: A response to Aram, Meyers, and Ekelman (1990) , 1991, Brain and Language.

[14]  D J Povel,et al.  Evidence against a Predictive Role for Rhythm in Speech Perception , 1986, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[15]  Anne Cutler,et al.  The monolingual nature of speech segmentation by bilinguals , 1992, Cognitive Psychology.

[16]  A. Butcher Aspects of the Speech Pause: Phonetic Correlates and Communication Functions , 1981 .

[17]  Mark Liberman,et al.  Use of nonsense‐syllable mimicry in the study of prosodic phenomena , 1976 .

[18]  John Bear,et al.  Automatic Detection and Correction of Repairs in Human-Computer Dialog , 1992, HLT.

[19]  Elizabeth Couper-Kuhlen,et al.  Contextualizing Discourse: The Prosody of Interactive Repair , 1992 .

[20]  Cheryl M. Beach,et al.  The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations☆ , 1991 .

[21]  Carmen Egido,et al.  Blocking of alveolar flapping in speech production: the role of syntactic boundaries and deletion sites , 1980 .

[22]  Jean Carletta,et al.  A System Architecture for Simulating Time-Constrained Language Production , 1993 .

[23]  R M Warren,et al.  Perceptual restoration of obliterated sounds. , 1984, Psychological bulletin.

[24]  Frieda Goldman Eisler Psycholinguistics : experiments in spontaneous speech , 1968 .

[25]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[26]  R. Quirk,et al.  A Corpus of English Conversation , 1980 .

[27]  I. Pollack,et al.  Intelligibility of Excerpts from Conversation , 1963 .

[28]  H. Gross Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand , 1983 .

[29]  Donald Hindle,et al.  Deterministic Parsing of Syntactic Non-fluencies , 1983, ACL.

[30]  Elizabeth Shriberg,et al.  Intonation of clause-internal filled pauses , 1992, ICSLP.

[31]  J. Blankenship,et al.  Hesitation Phenomena in English Speech: A Study in Distribution , 1964 .

[32]  H. Buxton Temporal Predictability in the Perception of English Speech , 1983 .

[33]  E. Schegloff,et al.  The preference for self-correction in the organization of repair in conversation , 1977 .

[34]  Emanuel A. Schegloff,et al.  The Relevance of Repair to Syntax-for-Conversation in Discourse and Syntax. , 1979 .

[35]  O. Bloodstein,et al.  Is it stuttering? , 1988, ASHA.

[36]  S. Rochester The significance of pauses in spontaneous speech , 1973, Journal of psycholinguistic research.

[37]  Sieb G. Nooteboom,et al.  Contributions of prosody to speech perception , 1976 .

[38]  F. Grosjean How long is the sentence? Prediction and prosody in the on-line processing of language , 1983 .

[39]  J. Deese,et al.  Pauses, prosody, and the demands of production in language , 1980 .

[40]  Anne Cutler,et al.  Prosody: Models and measurements , 1983 .

[41]  A. Cutler,et al.  Rhythmic cues to speech segmentation: Evidence from juncture misperception , 1992 .

[42]  H. Kolk,et al.  On The Relation among Speech Errors, Disfluencies, and Self-Repairs , 1990, Language and speech.

[43]  Ulrich Hans Frauenfelder,et al.  The syllable's role in speech segmentation , 1981 .

[44]  James G. Martin,et al.  The perception of hesitation in spontaneous speech , 1968 .

[45]  J. G. Martin Rhythmic and segmental perception are not independent. , 1979, The Journal of the Acoustical Society of America.

[46]  D. O’connell,et al.  The Trouble with "Articulatory" Pauses , 1983 .

[47]  Elizabeth R. Blacfkmer,et al.  Theories of monitoring and the timing of repairs in spontaneous speech , 1991, Cognition.

[48]  Stanley Feldstein,et al.  Of speech and time : temporal speech patterns in interpersonal contexts , 1981 .

[49]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[50]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[51]  Julia Hirschberg,et al.  A Speech-First Model for Repair Detection and Correction , 1993, HLT.

[52]  L K Tyler,et al.  Is gating an on-line task? Evidence from naming latency data , 1985, Perception & psychophysics.

[53]  Hy Murveit,et al.  Spontaneous Speech Effects In Large Vocabulary Speech Recognition Applications , 1992, HLT.

[54]  J. Klein,et al.  Syntactic structure and acoustic pattern in speech perception Arthur Wingfield , 1971 .

[55]  L. Streeter Acoustic determinants of phrase boundary perception. , 1978, The Journal of the Acoustical Society of America.

[56]  Jean Carletta,et al.  A Collection of Self-repairs from the Map Task Corpus , 1993 .

[57]  A. Cutler Slips of the Tongue and Language Production , 1982 .

[58]  Julia Hirschberg,et al.  Now Let’s Talk About Now; Identifying Cue Phrases Intonationally , 1987, ACL.

[59]  R. Cole,et al.  Perception of temporal order in speech: the role of vowel transitions. , 1973, Canadian journal of psychology.

[60]  Douglas D. O'Shaughnessy Analysis and automatic recognition of false starts in spontaneous speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  Anne Cutler,et al.  Prosodic marking in speech repair , 1983 .

[62]  L. Tyler,et al.  Quantifying contextual contributions to word-recognition processes , 1983, Perception & psychophysics.

[63]  S. Kasl,et al.  THE RELATIONSHIP OF DISTURBANCES AND HESITATIONS IN SPONTANEOUS SPEECH TO ANXIETY. , 1965, Journal of personality and social psychology.

[64]  Hugo Queué Durational cues for word segmentation Dutch , 1992 .

[65]  D. L. Imhoff,et al.  Reaction time to temporally-displaced phoneme targets in continuous speech. , 1976, Journal of experimental psychology. Human perception and performance.

[66]  W. Levelt,et al.  Monitoring and self-repair in speech , 1983, Cognition.

[67]  A S Bregman,et al.  Auditory streaming and the continuity illusion , 1990, Perception & psychophysics.

[68]  C. F. Hockett A Course in Modern Linguistics , 1959 .

[69]  G. Kempen,et al.  A dual system for producing self-repairs in spontaneous speech: Evidence from experimentally elicited corrections , 1987, Cognitive Psychology.

[70]  D. Duez,et al.  Perception of Silent Pauses in Continuous Speech , 1985, Language and speech.

[71]  A G Samuel,et al.  Attention within auditory word perception: insights from the phonemic restoration illusion. , 1986, Journal of experimental psychology. Human perception and performance.

[72]  D. K. Oller,et al.  The effect of position in utterance on speech segment duration in English. , 1973, The Journal of the Acoustical Society of America.

[73]  Gerd Quinting Hesitation phenomena in adult aphasic and normal speech , 1971 .

[74]  J. Mehler,et al.  The periodicity bias , 1993 .

[75]  L. Larkey,et al.  Reiterant speech: an acoustic and perceptual validation. , 1983, The Journal of the Acoustical Society of America.

[76]  J G Martin,et al.  Rhythmic (hierarchical) versus serial structure in speech and other behavior. , 1972, Psychological review.

[77]  G. Miller,et al.  The effect of variations in Nonfluency on audience ratings of source credibility , 1964 .

[78]  M. Pitt,et al.  The use of rhythm in attending to speech. , 1990, Journal of experimental psychology. Human perception and performance.

[79]  I. Lehiste,et al.  Role of duration in disambiguating syntactically ambiguous sentences , 1975 .

[80]  H. Kolk,et al.  The covert repair hypothesis: prearticulatory repair processes in normal and stuttered disfluencies. , 1993, Journal of speech and hearing research.

[81]  Irwin Pollack,et al.  Intelligibility of Excerpts from Fluent Speech: Effects of Rate of Utterance and Duration of Excerpt , 1963 .

[82]  F Grosjean,et al.  Spoken word recognition processes and the gating paradigm , 1980, Perception & psychophysics.

[83]  M F Dorman,et al.  Perception of temporal order in vowel sequences with and without formant transitions. , 1974, Journal of experimental psychology. Human perception and performance.

[84]  Douglas D. O'Shaughnessy Analysis of false starts in spontaneous speech , 1992, ICSLP.

[85]  P Howell,et al.  The Use of Prosody in Highlighting Alterations in Repairs from Unrestricted Speech , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[86]  D. Scott,et al.  Segmental Phonology and the Perception of Syntactic Structure. , 1984 .

[87]  Stefanie Shattuck-Hufnagel,et al.  The Use of Prosody in Syntactic Disambiguation , 1991, HLT.

[88]  Anne Cutler,et al.  The role of strong syllables in segmentation for lexical access , 1988 .

[89]  H. Kolk,et al.  The effects of noise masking and required accuracy on speech errors, disfluencies, and self-repairs. , 1992, Journal of speech and hearing research.

[90]  Lynette Hirschman,et al.  Multi-Site Data Collection for a Spoken Language Corpus , 1992, HLT.

[91]  A. E. Hieke A Content-Processing View of Hesitation Phenomena , 1981 .

[92]  J A Bashford,et al.  Increasing the intelligibility of speech through multiple phonemic restorations. , 1990, Perception & psychophysics.

[93]  Hagen Langer Syntactic Normalization Of Spontaneous Speech , 1990, COLING.

[94]  J. G. Martin Rhythmic Expectancy in Continuous Speech Perception , 1975 .

[95]  I. Pollack,et al.  Intelligibility of excerpts from fluent speech: Auditory vs , 1964 .

[96]  Frieda Goldman-Eisler,et al.  Speech Production and the Predictability of Words in Context , 1958 .

[97]  D. E. Allen,et al.  Conversation analysis : the sociology of talk , 1978 .

[98]  C. Darwin,et al.  On the Dynamic Use of Prosody in Speech Perception , 1975 .

[99]  B H Repp,et al.  Perceptual restoration of a “missing” speech sound: Auditory induction or illusion? , 1992, Perception & psychophysics.

[100]  H. W. Dechert,et al.  Temporal variables in speech : studies in honour of Frieda Goldman-Eisler , 1980 .

[101]  D. Duez Acoustic correlates of subjective pauses , 1993 .

[102]  P. Denes,et al.  The speech chain : the physics and biology of spoken language , 1963 .

[103]  F. Grosjean,et al.  The gating paradigm: A comparison of successive and individual presentation formats , 1984 .

[104]  K. Pike,et al.  The intonation of American English , 1946 .

[105]  Carlos Soares,et al.  Planning speech for execution at different tempos , 1982 .

[106]  A. Cutler Phoneme-monitoring reaction time as a function of preceding intonation contour , 1976 .

[107]  William D. Marslen-Wilson,et al.  THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1992,45A (1) 73-87 Prosodic Effects in Minimal Attachment , 2022 .

[108]  Bl Ben Cardozo,et al.  Short-time characteristics of periodicity pitch , 1965 .

[109]  Mark Steedman,et al.  The use of context by the psychological parser , 1981 .

[110]  Willem J. M. Levelt,et al.  Studies in the perception of language , 1982 .