Effects of disfluencies, predictability, and utterance position on word form variation in English conversation.

Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., thi, thaet, aend, inverted-v v) or a more reduced or lenited pronunciation (e.g., thax, thixt, n, ax). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.

[1]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[2]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[3]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[4]  P. Lieberman Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech , 1963 .

[5]  Terence H. Wilbur,et al.  Schuchardt, the neogrammarians, and the transformational theory of phonological change : four essays , 1972 .

[6]  D. Klatt Vowel Lengthening is Syntactically Determined in a Connected Discourse. , 1975 .

[7]  Timothy Shopen,et al.  Style and variables in English , 1981 .

[8]  H. H. Hock Principles of historical linguistics , 1986 .

[9]  G S Dell,et al.  A spreading-activation theory of retrieval in sentence production. , 1986, Psychological review.

[10]  C. Fowler,et al.  Talkers' signaling of new and old. words in speech and listeners' perception and use of the distinction , 1987 .

[11]  D. Bolinger Intonation and its parts : melody in spoken English , 1987 .

[12]  T. Crystal,et al.  Articulation rate and the duration of syllables and stress groups in connected speech. , 1990, The Journal of the Acoustical Society of America.

[13]  A. Agresti An introduction to categorical data analysis , 1997 .

[14]  D. O'Shaughnessy,et al.  Recognition of hesitations in spontaneous speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[16]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  M. MacDonald The interaction of lexical and syntactic ambiguity , 1993 .

[18]  W. Levelt,et al.  Word frequency effects in speech production: Retrieval of syntactic information and of phonological form , 1994 .

[19]  Dani Byrd,et al.  Phonetic analyses of word and segment variation using the TIMIT corpus of American english , 1994, Speech Commun..

[20]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[21]  Dani Byrd,et al.  Relations of sex and dialect to reduction , 1994, Speech Communication.

[22]  E. Shriberg,et al.  Acoustic properties of disfluent repetitions , 1995 .

[23]  W. Bruce Croft Intonation units and grammatical structure , 1995 .

[24]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[25]  Steven Greenberg,et al.  INSIGHTS INTO SPOKEN LANGUAGE GLEANED FROM PHONETIC TRANSCRIPTION OF THE SWITCHBOARD CORPUS , 1996 .

[26]  Richard A. Rhodes,et al.  English reduced vowels and the nature of natural processes , 1996 .

[27]  Willem J. M. Levelt,et al.  A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[28]  Daniel Jurafsky,et al.  A Probabilistic Model of Lexical and Syntactic Access and Disambiguation , 1996, Cogn. Sci..

[29]  Jean E. Fox Tree,et al.  Pronouncing “the” as “thee” to signal problems in speaking , 1997, Cognition.

[30]  P. Keating,et al.  Articulatory strengthening at edges of prosodic domains. , 1997, The Journal of the Acoustical Society of America.

[31]  Karen A. F. Copeland An Introduction to Categorical Data Analysis , 1997 .

[32]  Zenzi M. Griffin,et al.  Constraint, Word Frequency, and the Relationship between Lexical Processing Levels in Spoken Word Production , 1998 .

[33]  M. Tanenhaus,et al.  Modeling the Influence of Thematic Fit (and Other Constraints) in On-line Sentence Comprehension , 1998 .

[34]  H. H. Clark,et al.  Repeating Words in Spontaneous Speech , 1998, Cognitive Psychology.

[35]  Eric Fosler-Lussier CONTEXTUAL WORD AND SYLLABLE PRONUNCIATION MODELS , 1999 .

[36]  William D. Raymond,et al.  The effects of collocational strength and contextual predictability in lexical production 1 , 1999 .

[37]  Eric Fosler-Lussier,et al.  Effects of speaking rate and word frequency on pronunciations in convertional speech , 1999, Speech Commun..

[38]  Nelson Morgan,et al.  Dynamic pronunciation models for automatic speech recognition , 1999 .

[39]  Fosler-Lussier,et al.  EFFECTS OF SPEAKING RATE AND WORD FREQUENCY ONCONVERSATIONAL PRONUNCIATIONSEric , 1999 .

[40]  Madelaine C. Plauché,et al.  DATA-DRIVEN SUBCLASSIFICATION OF DISFLUENT REPETITIONS BASED ON PROSODIC FEATURES , 1999 .

[41]  Maryellen C. MacDonald,et al.  A probabilistic constraints approach to language acquisition and processing , 1999, Cogn. Sci..

[42]  Elizabeth Shriberg,et al.  Phonetic Consequences of Speech Disfluency , 1999 .

[43]  Daniel Jurafsky,et al.  The role of the lemma in form variation , 2002 .

[44]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[45]  Shrikanth S. Narayanan,et al.  Phrasal signatures in articulation , 2000 .

[46]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[47]  C. Habel,et al.  Language , 1931, NeuroImage.

[48]  Andrei Popescu-Belis,et al.  What are discourse markers ? , 2003 .