A Predictive Model of Prosody Through Grammatical Interface: A computational Approach

Speech prosody is manifest in the speech signal through the modulation of pitch, loudness, duration, and voice quality, which combine to encode the prosodic structure of an utterance. Prosodic structure defines the location of prominent words, and the groupings of words into phonological phrases. Prosodic structure, in turn, relates the phonological form of an utterance to its morphological, syntactic, semantic, and pragmatic context. The listener's task in comprehending speech includes decoding prosodic structure to aid in identifying these linguistic contexts that comprise the meaning of the utterance. The research reported in this book focuses on acoustic and perceptual evidence for prosody in spoken American English, and the relationship between prosodic structure and higher levels of linguistic organization. The study adopts a computational approach that uses natural language processing and machine learning to investigate prosody in a speech corpus. It is shown that prosodic features can be reliably predicted from a set of features that encode the phonetic, phonological, syntactic, and semantic properties of an utterance.

[1]  Mark Steedman,et al.  Information Structure and the Syntax-Phonology Interface , 2000, Linguistic Inquiry.

[2]  Jmb Jacques Terken,et al.  The perception of prosodic prominence , 2000 .

[3]  D. Robert Ladd,et al.  Intonational phrasing: the case for recursive prosodic structure , 1986, Phonology.

[4]  Stephen Cox,et al.  Using Part-Of-Speech Tags for Predicting Phrase Breaks , 2004 .

[5]  J. Venditti Japanese ToBI Labelling Guidelines , 1997 .

[6]  Stefanie Shattuck-Hufnagel,et al.  The original ToBI system and the evolution of the ToBI framework , 2003 .

[7]  S. Buchholz,et al.  Memory-Based Grammatical Relation Finding , 2002 .

[8]  Caroline Féry,et al.  German intonational patterns , 1993 .

[9]  J. O'connor Intonation Of Colloquial English , 1961 .

[10]  P. Keating,et al.  Articulatory strengthening at edges of prosodic domains. , 1997, The Journal of the Acoustical Society of America.

[11]  Shimei Pan,et al.  Word Informativeness and Automatic Pitch Accent Modeling , 1999, EMNLP.

[12]  C. Bartels,et al.  Towards a compositional interpretation of English statement and question intonation , 1997 .

[13]  H Levitt,et al.  Analysis of fundamental frequency contours in speech. , 1971, The Journal of the Acoustical Society of America.

[14]  E. Martin Toward an analysis of subjective phrase structure. , 1970 .

[15]  Xuejing Sun,et al.  Pitch accent prediction using ensemble machine learning , 2002, INTERSPEECH.

[16]  C. Gussenhoven The phonology of tone and intonation , 2004 .

[17]  Dan Roth,et al.  Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora , 2006, ACL.

[18]  Jan Edwards,et al.  Papers in Laboratory Phonology: Lengthenings and shortenings and the nature of prosodic constituency , 1990 .

[19]  William E. Cooper,et al.  Syntax and Speech , 1980 .

[20]  Allison Blodgett,et al.  The interaction of prosodic phrasing, verb bias, and plausibility during spoken sentence comprehension , 2004 .

[21]  George N. Clements,et al.  Downstep and high raising: interacting factors in Yoruba tone production , 2003, J. Phonetics.

[22]  K. J. Kohler,et al.  Prosody in speech synthesis: the interplay between basic research and TTS application , 1991 .

[23]  Richard Sproat English noun-phrase accent prediction for text-to-speech , 1994, Comput. Speech Lang..

[24]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[25]  Sabine Buchholz,et al.  Influence of syntax on prosodic boundary prediction , 2005, INTERSPEECH.

[26]  Chilin Shih,et al.  Pitch downtrend in Spanish , 1996 .

[27]  Janet B. Pierrehumbert,et al.  Tonal Elements and Their Alignment , 2000 .

[28]  A. Seidl Infants’ use and weighting of prosodic cues in clause segmentation , 2007 .

[29]  Mark Hasegawa-Johnson,et al.  Intertranscriber reliability of prosodic labeling on telephone conversation using toBI , 2004, INTERSPEECH.

[30]  Julia Hirschberg,et al.  Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..

[31]  M. Tanenhaus,et al.  Accent and reference resolution in spoken-language comprehension , 2002 .

[32]  Taehong Cho,et al.  The Effects of Prosody on Articulation in English , 2002 .

[33]  Donna Erickson,et al.  Global pitch range and the production of low tones in English intonation , 1994, ICSLP.

[34]  Karlos Arregui-Urbina,et al.  Focus on Basque movements , 2002 .

[35]  Merle Horne,et al.  Prosody: Theory and Experiment , 2000 .

[36]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[37]  Mari Ostendorf,et al.  Prediction of abstract prosodic labels for speech synthesis , 1996, Comput. Speech Lang..

[38]  Jo Verhoeven,et al.  Influence of adjacent pitch accents on each other's perceived prominence: two contradictory effects , 1994 .

[39]  Antal van den Bosch Wrapped progressive sampling search for optimizing learning algorithm parameters , 2005 .

[40]  Eileen Fitzpatrick,et al.  The Contribution of Parsing to Prosodic Phrasing in an Experimental Text-to-Speech System , 1986, ACL.

[41]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[42]  Mari Ostendorf,et al.  A dynamical system model for recognizing intonation patterns , 1995, EUROSPEECH.

[43]  Julia Hirschberg,et al.  The interpretation of the high-rise question contour in English , 1995 .

[44]  D. Ladd,et al.  The perception of intonational emphasis: continuous or categorical? , 1997 .

[45]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[46]  David Crystal,et al.  Prosodic Systems and Intonation in English , 1969 .

[47]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[48]  Stefanie Shattuck-Hufnagel,et al.  Glottalization of word-initial vowels as a function of prosodic structure , 1996 .

[49]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[50]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[51]  Alice Turk,et al.  A cross-linguistic study of accentual lengthening: Dutch vs. English , 1999 .

[52]  Ann K. Syrdal,et al.  Inter-transcriber reliability of toBI prosodic labeling , 2000, INTERSPEECH.

[53]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[54]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[55]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[56]  John Lyons,et al.  Introduction to Theoretical Linguistics , 1971 .

[57]  Keith L. Snider Phonetic realisation of downstep in Bimoba , 1998 .

[58]  Rebecca Herman,et al.  The Conceptual Similarity of Intonational Tones and its Effects on Intertranscriber Reliability , 2002, Language and speech.

[59]  Mari Ostendorf,et al.  The use of prosody in syntactic disambiguation , 1991 .

[60]  Josef Taglicht,et al.  Constraints on intonational phrasing in English , 1998, Journal of Linguistics.

[61]  Mari Ostendorf,et al.  Computational models of the prosody/syntax mapping for spoken language systems , 1994 .

[62]  Duane G. Watson,et al.  The relationship between intonational phrasing and syntactic structure in language production , 2004 .

[63]  Carlo Caini,et al.  An Automatic System for Detecting Prosodic Prominence in American English Continuous Speech , 2005, Int. J. Speech Technol..

[64]  D. J. Hermes,et al.  The frequency scale of speech intonation. , 1991, The Journal of the Acoustical Society of America.

[65]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[66]  Esther Grabe,et al.  Comparative intonational phonology: English and German , 1998 .

[67]  Elmar Nöth,et al.  Detection of phrase boundaries and accents , 1994 .

[68]  Marjorie K. M. Chan,et al.  An Autosegmental-Metrical Analysis and Prosodic Annotation Conventions for Cantonese* , 2005 .

[69]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[70]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[71]  Taehong Cho,et al.  Domain-initial articulatory strengthening in four languages , 2003 .

[72]  M. Beckman,et al.  The articulatory kinematics of final lengthening. , 1991, The Journal of the Acoustical Society of America.

[73]  I. Lehiste,et al.  Role of duration in disambiguating syntactically ambiguous sentences , 1975 .

[74]  Dan Roth,et al.  Semantic Role Labeling Via Generalized Inference Over Classifiers , 2004, CoNLL.

[75]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[76]  Jan P. H. van Santen,et al.  Contextual effects on vowel duration , 1992, Speech Commun..

[77]  Lewis P. Shapiro,et al.  Prosody and the processing of filler-gap sentences , 1994, Journal of psycholinguistic research.

[78]  Daniel Jurafsky,et al.  The detection of emphatic words using acoustic and lexical features , 2005, INTERSPEECH.

[79]  Alice Turk,et al.  The domain of accentual lengthening in American English , 1997 .

[80]  Mark Hasegawa-Johnson,et al.  An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[81]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[82]  M. Swerts Prosodic features at discourse boundaries of different strength. , 1997, The Journal of the Acoustical Society of America.

[83]  B. Rosner,et al.  Loudness predicts prominence: fundamental frequency lends little. , 2005, The Journal of the Acoustical Society of America.

[84]  D. Bolinger Accent Is Predictable (If You're a Mind-Reader) , 1972 .

[85]  Elisabeth Selkirk,et al.  Phonology and Syntax: The Relation between Sound and Structure , 1984 .

[86]  Martine Grice,et al.  Leading tones and downstep in English , 1995, Phonology.

[87]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .

[88]  George N. Clements,et al.  The Description of Terraced-Level Tone Languages , 1979 .

[89]  R. Jackendoff Foundations of Language: Brain, Meaning, Grammar, Evolution , 2002 .

[90]  K. Pike,et al.  The intonation of American English , 1946 .

[91]  J. Pierrehumbert,et al.  Japanese Tone Structure , 1988 .

[92]  Ann K Wennerstrom,et al.  Discourse intonation and second language acquisition: Three genre-based studies , 1997 .

[93]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[94]  Edward Gibson,et al.  A comparison of inter-transcriber reliability for two systems of prosodic annotation: rap (rhythm and pitch) and toBI (tones and break indices) , 2006, INTERSPEECH.

[95]  Mary E. Beckman,et al.  The Parsing of Prosody , 1996 .

[96]  Julia Hirschberg,et al.  Implicating Uncertainty: The Pragmatics of Fall-Rise Intonation , 1985 .

[97]  Anton Batliner,et al.  Consistency in transcription and labelling of German intonation with GToBI , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[98]  Laurence White,et al.  Structural influences on accentual lengthening in English , 1999 .

[99]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[100]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[101]  M. Kjelgaard,et al.  Prosodic Facilitation and Interference in the Resolution of Temporary Syntactic Closure Ambiguity , 1999 .

[102]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[103]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[104]  D. Terence Langendoen,et al.  Finite-State Parsing of Phrase-Structure Languages and the Status of Readjustment Rules in Grammar , 2010 .

[105]  G. Glass,et al.  Statistical methods in education and psychology , 1970 .

[106]  Julia Hirschberg,et al.  Evaluation of prosodic transcription labeling reliability in the tobi framework , 1994, ICSLP.

[107]  Elisabeth Selkirk,et al.  The Interaction of Constraints on Prosodic Phrasing , 2000 .

[108]  D. Robert Ladd,et al.  Papers in Laboratory Phonology: Metrical representation of pitch register , 1990 .

[109]  J. Trueswell,et al.  Using prosody to avoid ambiguity: Effects of speaker awareness and referential context , 2003 .

[110]  Amalia Arvaniti,et al.  Intonational Analysis and Prosodic Annotation of Greek Spoken Corpora , 2005 .

[111]  Knud Lambrecht,et al.  Information structure and sentence form , 1994 .

[112]  D. Ladd Declination ‘‘reset’’ and the hierarchical organization of utterances , 1988 .

[113]  J B Pierrehumbert,et al.  Categories of tonal alignment in English. , 1989, Phonetica.

[114]  Julia Hirschberg,et al.  Learning prosodic features using a tree representation , 2001, INTERSPEECH.

[115]  Gayle M. Ayers Nuclear Accent Types and Prominence: Some Psycholinguistic Experiments / , 1996 .

[116]  Chilin Shih,et al.  Duration Study for the Bell Laboratories Mandarin Text-to-Speech System , 1997 .

[117]  J. Terken,et al.  Fundamental frequency and perceived prominence of accented syllables. II. Nonfinal accents. , 1994, The Journal of the Acoustical Society of America.

[118]  Hubert Truckenbrodt,et al.  Upstep and embedded register levels , 2002, Phonology.

[119]  Eileen Fitzpatrick The Prosodic Phrasing of Clause-Final Prepositional Phrases , 2001 .

[120]  Julia Hirschberg,et al.  Automatic classification of intonational phrase boundaries , 1992 .

[121]  Dwight L. Bolinger,et al.  Intonation and Its Uses: Melody in Grammar and Discourse , 1989 .

[122]  Yiya Chen,et al.  Durational adjustment under corrective focus in Standard Chinese , 2006, J. Phonetics.

[123]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[124]  D. Ladd Phonological Features of Intonational Peaks , 1983 .

[125]  Carlos Gussenhoven,et al.  Gesture, Segment, Prosody: Downstep in Dutch: implications for a model , 1992 .

[126]  B. Connell,et al.  The Perception of Lexical Tone in Mambila , 2000, Language and speech.

[127]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[128]  Audra Dainora,et al.  An empirically based probabilistic model of intonation in American English , 2001 .

[129]  Matthias Reyelt Consistency of prosodic transcriptions : labelling experiments with trained and untrained transcribers , 1996 .

[130]  Stefanie Shattuck-Hufnagel,et al.  A prosody tutorial for investigators of auditory sentence processing , 1996, Journal of psycholinguistic research.

[131]  Philipp Koehn,et al.  Improving intonational phrasing with syntactic information , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[132]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[133]  Matthew P. Aylett,et al.  Stochastic suprasegmentals: relationships between redundancy, prosodic structure and care of articulation in spontaneous speech , 2000, INTERSPEECH.

[134]  Laura C. Dilley,et al.  The phonetics and phonology of tonal systems. , 2005 .

[135]  Keren Rice,et al.  On defining the intonational phrase: evidence from Slave , 1987, Phonology.

[136]  Matthew P. Aylett,et al.  Prosodic transcription of Glasgow English: an evaluation study of GlaToBI , 1997 .

[137]  S. Jun The Phonetics and Phonology of Korean Prosody , 2018 .

[138]  M. D. Riley Tree-based modeling of segmental durations , 1992 .

[139]  W. Krzanowski The Performance of Fisher's Linear Discriminant Function Under Non-Optimal Conditions , 1977 .

[140]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.