From syntax to acoustic duration: A dynamical model of speech rhythm production

This paper presents a speech rhythm production model able to generate segmental acoustic duration from several levels of dynamical coupling between linguistic and production-related subsystems. A probabilistic algorithm for phrase stress assignment accounts for both prominence and constituency prosodic relations by considering the coupling between a dependency-grammar system of markers and constituent-size constraints. This algorithm copes with intra- and inter-speaker prosodic variability. Having as input the position and magnitude of underlying phrase stress, and a set of dynamical control parameters, the model acts at three nested temporal domains to assign segmental duration in Brazilian Portuguese. The modelled V-to-V duration patterns reproduce the patterns found at the surface under several conditions of perturbation. The nature and advantages of the dynamical model of speech rhythm production for simulating natural data are thoroughly discussed.

[1]  R. M. Dauer Stress-timing and syllable-timing reanalyzed. , 1983 .

[2]  Plínio A. Barbosa,et al.  Abstractness in speech-metronome synchronisation: P-centres as cyclic attractors , 2005, INTERSPEECH.

[3]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[4]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[5]  Antje Schweitzer,et al.  Prosody Generation in the SmartKom Project , 2002 .

[6]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[7]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[8]  James R. Hurford,et al.  Linguistic Controversies: Essays in Linguistic Theory and Practice in Honour of F. R. Palmer , 1984 .

[9]  Duane G. Watson,et al.  The relationship between intonational phrasing and syntactic structure in language production , 2004 .

[10]  S. Richer Eléments de syntaxe structurale , par Lucien Tesnière. Paris, Klincksieck, 1959, 670 pages. , 1960 .

[11]  R. Port,et al.  Rhythmic constraints on stress timing in , 1998 .

[13]  D. Erickson Effects of Contrastive Emphasis on Jaw Opening , 1998, Phonetica.

[14]  S. M. Marcus Acoustic determinants of perceptual center (P-center) location , 1981, Perception & psychophysics.

[15]  Plínio Almeida Barbosa,et al.  Unifying Stress Shift and Secondary Stress Phenomena with a Dynamical Systems Rhythm Rule , 2004 .

[16]  J. van Santen,et al.  Suprasegmental and segmental timing models in Mandarin Chinese and American English. , 2000, The Journal of the Acoustical Society of America.

[17]  P. Fraisse The psychology of time , 1963 .

[18]  Pablo Arantes,et al.  Secondary stress in Brazilian Portuguese: the interplay between production and perception studies , 2006 .

[19]  J. Devin McAuley On the Perception of Time as Phase: Toward an Adaptive-Oscillator Model of Rhythm , 1995 .

[20]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[21]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[22]  A. Classe The rhythm of English prose , 1939 .

[23]  F. Grosjean,et al.  Les structures de performance en français: caractérisation et prédiction , 1993 .

[24]  J. Laver,et al.  The handbook of phonetic sciences , 1999 .

[25]  Plinio Almeida Barbosa,et al.  AT LEAST TWO MACRORHYTHMIC UNITS ARE NECESSARY FOR MODELING BRAZILIAN PORTUGUESE DURATION: EMPHASIS ON AUTOMATIC SEGMENTAL DURATION GENERATION , 1996 .

[26]  Helmut Schmid,et al.  New Statistical Methods for Phrase Break Prediction , 2004, COLING.

[27]  Christoph E. Schreiner,et al.  Representation of CV-sounds in cat primary auditory cortex: intensity dependence , 2003, Speech Commun..

[28]  Stefan Sudhoff,et al.  Methods in empirical prosody research , 2006 .

[29]  Philippe Martin,et al.  Prosodic and rhythmic structures in French , 1987 .

[30]  Plínio Almeida Barbosa,et al.  Aiuruete: a high-quality concatenative text-to-speech system for brazilian portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production , 1999, EUROSPEECH.

[31]  Plínio Almeida Barbosa,et al.  Explaining Cross-Linguistic Rhythmic Variability via a Coupled-Oscillator Model of Rhythm Production , 2002 .

[32]  Dani Byrd,et al.  The elastic phrase: modeling the dynamics of boundary-adjacent lengthening , 2003, J. Phonetics.

[33]  D. Buonomano,et al.  The neural basis of temporal processing. , 2004, Annual review of neuroscience.

[34]  P. MacNeilage,et al.  The frame/content theory of evolution of speech production , 1998, Behavioral and Brain Sciences.

[35]  Michael Riley,et al.  Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[36]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[37]  Plínio Almeida Barbosa Caractérisation et génération automatique de la structuration rythmique du français , 1994 .

[38]  Michaela Atterer Experiments on the prediction of prosodic phrasing for German text to speech synthesis , 2005 .

[39]  Stefanie Shattuck-Hufnagel,et al.  The Limited Use of Distinctive Features and Markedness in Speech Production: Evidence from Speech Error Data. , 1979 .

[40]  M. Tabain Effects of prosodic boundary on /aC/ sequences: articulatory results. , 2003, The Journal of the Acoustical Society of America.

[41]  Raquel Santana Santos Incursões em torno do ritmo da fala , 2007 .

[42]  Stephen Isard,et al.  Segment durations in a syllable frame , 1991 .

[43]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[44]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[45]  K. G. Munhall,et al.  R.H. Stetson's Motor Phonetics: A Retrospective Edition , 1988 .

[46]  S. Hibi,et al.  Rhythm perception in repetitive sound sequence , 1983 .

[47]  Louis Goldstein,et al.  Articulatory gestures as phonological units , 1989, Phonology.

[48]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[49]  Fred Cummins,et al.  Probing the Dynamics of Speech Production , 2006 .

[50]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[51]  Peter Roach On the distinction between 'stress-timed' and 'syllable-timed' languages , 1982 .

[52]  Robert F. Port,et al.  Rhythmic constraints on stress timing in English , 1998 .

[53]  Y. Tohkura,et al.  Speech, Perception, Production and Linguistic Structure , 1992 .

[54]  A. Eriksson,et al.  Aspects of Swedish speech rhythm , 1991 .

[55]  Mari Ostendorf,et al.  Prosody and Parsing , 1989, HLT.

[56]  Chilin Shih,et al.  Efficient adaptation of TTS duration model to new speakers , 1998, ICSLP.