A Phonetic Model of English Intonation

This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operate. It is also argued that this model should be capable of both analysis (F to phonology) and synthesis (phonology to F ). Existing phonetic models are reviewed and it is shown that none meet the specification for the type of formal model required. A new phonetic model is presented that has three levels of description: the F level, the intermediate level and the phonological level. The intermediate level uses the three basic elements of rise, fall and connection to model F contours. A mathematical equation is specified for each of these elements so that a continuous F contour can be created from a sequence of elements. The phonological system uses H and L to describe high and low pitch accents, C to describe connection elements and B to describe the rises that occur at phrase boundaries. A fully specified grammar is described which links the intermediate and F levels. A grammar is specified for linking the phonological and intermediate levels, but this is only partly complete due to problems with the phonological level of description. A computer implementation of the model is described. Most of the implementation work concentrated on the relationship between the intermediate level and the F level. Results are given showing that the computer analysis system labels F contours quite accurately, but is significantly worse than a human labeller. It is shown that the synthesis system produces artificial F contours that are very similar to naturally occurring F contours. The thesis concludes with some indications of further work and ideas on how the computer implementation of the model could be of practical benefit in speech synthesis and recognition.

[1]  Matthias Pätzold,et al.  F0 synthesis based on a quantitative model of German intonation , 1992, ICSLP.

[2]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[3]  Elisabeth Selkirk,et al.  Phonology and syntax , 1984 .

[4]  D. Robert Ladd A model of intonational phonology for use in speech synthesis by rule , 1987, ECST.

[5]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[6]  Mark Liberman,et al.  The intonational system of English , 1979 .

[7]  Peter Sells,et al.  Lectures on contemporary syntactic theories , 1985 .

[8]  W. A. Woods,et al.  Language processing for speech understanding , 1986 .

[9]  Eric Zee,et al.  Tone and vowel quality , 1977 .

[10]  Daniel Jones An outline of English phonetics , 1956 .

[11]  Carlos Gussenhoven,et al.  Fundamental frequency declination in Dutch: testing three hy-potheses , 1988 .

[12]  Alexander Waibel Prosody and speech recognition (artificial intelligence) , 1986 .

[13]  W. Wundt,et al.  An Introduction to Psychology , 1912 .

[14]  Paul Taylor,et al.  A new model of intonation for use with speech synthesis and recognition , 1992, ICSLP.

[15]  A. Prince,et al.  On stress and linguistic rhythm , 1977 .

[16]  B. P. Lathi,et al.  Modern Digital and Analog Communication Systems , 1983 .

[17]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[18]  K. Pike,et al.  The intonation of American English , 1946 .

[19]  D. Robert Ladd,et al.  Intonational phrasing: the case for recursive prosodic structure , 1986, Phonology.

[20]  J. Hart,et al.  Intonation by rule: a perceptual quest , 1973 .

[21]  D. Klatt Vowel Lengthening is Syntactically Determined in a Connected Discourse. , 1975 .

[22]  Christopher Longuet-Higgins Tones of voice: the role of intonation in computer speech understanding , 1986 .

[23]  Mari Ostendorf,et al.  Parse scoring with prosodic information , 1992, ICSLP.

[24]  William E. Cooper,et al.  Fundamental Frequency in Sentence Production , 1981 .

[25]  Philip Lieberman,et al.  Intonation, Perception and Language , 1968 .

[26]  David Crystal,et al.  Prosodic Systems and Intonation in English , 1969 .

[27]  D. J. Hermes,et al.  The frequency scale of speech intonation. , 1991, The Journal of the Acoustical Society of America.

[28]  T. Crystal,et al.  Segmental durations in connected‐speech signals: Current results , 1988 .

[29]  Stephen D. Isard,et al.  Automatic diphone segmentation , 1991, EUROSPEECH.

[30]  Hisashi Kawai,et al.  Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[31]  J. O'connor Intonation Of Colloquial English , 1961 .

[32]  S. Ohman Word and sentence intonation, a quantitative model , 1967 .

[33]  Colin Yallop,et al.  An Introduction to Phonetics and Phonology , 1990 .

[34]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[35]  Douglas D. O'Shaughnessy,et al.  Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[36]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[37]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[38]  D. Abercrombie,et al.  Elements of General Phonetics , 1967 .

[39]  Victor Zue,et al.  Language modelling for recognition and understanding using layered bigrams , 1992, ICSLP.

[40]  D. Ladd Declination ‘‘reset’’ and the hierarchical organization of utterances , 1988 .

[41]  Fergus McInnes,et al.  Use of acoustic sentence level and lexical stress in HSMM speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  J. T. Hart,et al.  Integrating different levels of intonation analysis , 1975 .

[43]  P. H. Lindsay,et al.  Human Information Processing: An Introduction to Psychology , 1972 .

[44]  Evelyn Abberton,et al.  Laryngographic assessment of normal voice: A tutorial , 1989 .

[45]  Ilse Lehiste,et al.  Vowel Amplitude and Phonemic Stress in American English , 1959 .

[46]  Noam Chomsky,et al.  Chomsky: Selected readings; , 1971 .

[47]  D. Crystal,et al.  Intonation and Grammar in British English , 1967 .

[48]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[49]  Kim E. A. Silverman,et al.  The timing of prenuclear high accents in English , 1987 .

[50]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[51]  D. Bolinger Intonation: Levels Versus Configurations , 1951 .

[52]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[53]  D. Ladd,et al.  Declination.: a review and some hypotheses , 1984, Phonology Yearbook.

[54]  Nj Nico Willems STEP: A model of standard English intonation patterns , 1983 .