Timing in Speech: A Multi-Level Process

The modelling of timing in speech is of particular interest to language technology because it represents an interface between cognitive and mechanical aspects in the processes of speech production. Higher-level aspects of speech timing control are related to speaker-specific and utterance-contextspecific factors such that the same sequence of sounds produced by two different speakers (or by one speaker on two different occasions) will most likely have different timing characteristics. Lower level aspects that govern timing ensure on the other hand that there will also be some similarities in the productions since the vocal tract and articulatory mechanisms used to produce the individual sounds are fundamentally similar for all speakers of each language. The interaction of these two levels of influence results in patterns of timing that can be interpreted in a useful way to add an extra layer of meaning to an utterance.

[1]  V. K. Chew Talking machines , 1967 .

[2]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[3]  Nick Campbell,et al.  Automatic detection of prosodic boundaries in speech , 1993, Speech Commun..

[4]  George D. Allen,et al.  Speech Rhythm: Its Relation to Performance Universals and Articulatory Timing. , 1975 .

[5]  Osamu Fujimura,et al.  Allophonic variation in English /l/ and its implications for phonetic implementation , 1993 .

[6]  Bertil Lyberg,et al.  Some observations on the timing of Swedish utterances , 1977 .

[7]  D. K. Oller,et al.  The effect of position in utterance on speech segment duration in English. , 1973, The Journal of the Acoustical Society of America.

[8]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[9]  N. Umeda,et al.  Automatic synthesis from ordinary english test , 1973 .

[10]  J G Martin,et al.  Rhythmic (hierarchical) versus serial structure in speech and other behavior. , 1972, Psychological review.

[11]  N. Umeda Consonant duration in American English , 1977 .

[12]  I H Witten A Flexible Scheme for Assigning Timing and Pitch To Synthetic Speech , 1977, Language and speech.

[13]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[14]  Hitoshi Iida,et al.  A speech and language database for speech translation research , 1994, ICSLP.

[15]  Stephen Isard,et al.  Segment durations in a syllable frame , 1991 .

[16]  Rolf Carlson,et al.  A text-to-speech system based entirely on rules , 1976, ICASSP.

[17]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[18]  W. Nick Campbell,et al.  Prosodic encoding of English speech , 1992, ICSLP.

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[21]  T. Crystal,et al.  Segmental durations in connected‐speech signals: Current results , 1988 .

[22]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[23]  David Abercrombie,et al.  Studies in phonetics and linguistics , 1971 .

[24]  J. Kalbfleisch Statistical Inference Under Order Restrictions , 1975 .

[25]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[26]  A. House On Vowel Duration in English , 1961 .

[27]  A. Huggins,et al.  Just noticeable differences for segment duration in natural speech. , 1969, The Journal of the Acoustical Society of America.

[28]  A. Huggins,et al.  The Perception of Timing in Natural Speech I: Compensation Within the Syllable , 1968, Language and speech.

[29]  T. Crystal,et al.  Segmental durations in connected-speech signals: Syllabic stress , 1988 .

[30]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[31]  D. Klatt Vowel Lengthening is Syntactically Determined in a Connected Discourse. , 1975 .

[32]  J. G. Martin Rhythmic and segmental perception are not independent. , 1979, The Journal of the Acoustical Society of America.

[33]  Fredinand Pitrelli John Hierarchical modeling of phoneme duration : application to speech recognition , 1990 .

[34]  I. Lehiste Phonetic Disambiguation of Syntactic Ambiguity , 1973 .

[35]  W. N. Campbell Extracting speech-rate values from a real-speech database , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[36]  Jaap Van Brakel,et al.  Foundations of measurement , 1983 .

[37]  I. Lehiste,et al.  Role of duration in disambiguating syntactically ambiguous sentences , 1975 .

[38]  Michael Riley Statistical tree‐based modeling of phonetic segment durations , 1989 .

[39]  N. Umeda,et al.  Some Prosodic Details of American English , 1971 .

[40]  N. Umeda Vowel duration in American English. , 1975, The Journal of the Acoustical Society of America.

[41]  D. Bolinger Accent Is Predictable (If You're a Mind-Reader) , 1972 .

[42]  P. Luce,et al.  Contextual effects on vowel duration, closure duration, and the consonant/vowel ratio in speech production. , 1985, The Journal of the Acoustical Society of America.

[43]  Rpg Rene Collier A comment on the prediction of prosody , 1992 .

[44]  M. O'Malley,et al.  Recovering parentheses from spoken algebraic expressions , 1973 .

[45]  Elisabeth Selkirk,et al.  Phonology and Syntax: The Relation between Sound and Structure , 1984 .

[46]  Jan P. H. van Santen,et al.  Deriving text-to-speech durations from natural speech , 1990, SSW.

[47]  John Coleman,et al.  The phonetic interpretation of headed phonological structures containing overlapping constituents , 1992, Phonology.

[48]  William E Cooper,et al.  Hierarchical coding in speech timing , 1978, Cognitive Psychology.

[49]  M. D. Riley Tree-based modeling of segmental durations , 1992 .

[50]  A. House,et al.  The Influence of Consonant Environment upon the Secondary Acoustical Characteristics of Vowels , 1953 .

[51]  L. A. Chistovich,et al.  Speech: articulation and perception , 1965 .

[52]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[53]  J. G. Martin Rhythmic Expectancy in Continuous Speech Perception , 1975 .

[54]  T. Crystal,et al.  Articulation rate and the duration of syllables and stress groups in connected speech. , 1990, The Journal of the Acoustical Society of America.

[55]  D. Klatt The duration of (s) in English words. , 1974, Journal of speech and hearing research.

[56]  G. Allen The Location of Rhythmic Stress Beats in English: an Experimental Study I , 1972, Language and speech.

[57]  T H Crystal,et al.  Segmental durations in connected speech signals: preliminary results. , 1982, The Journal of the Acoustical Society of America.

[58]  A. Cohen,et al.  Structure and Process in Speech Perception , 1975 .

[59]  F. Goldman-Eisler Psycholinguistics: Experiments in spontaneous speech , 1968 .

[60]  Steve Young,et al.  SYNTHESIS BY RULE OF PROSODIC FEATURES IN WORD CONCATENATION SYNTHESIS , 1980 .

[61]  B.E.F. Lindblom,et al.  Some Temporal Regularities of Spoken Swedish , 1975 .