Characterisation of rhythmic patterns for text-to-speech synthesis

Abstract This article proposes an alternative rhythmic unit to the syllable: the inter-perceptual-center group (IPCG). This group is delimited by events which can be detected using only acoustic correlates (Pompino-Marschall, 1989). The rhythmic patterns for French are described using this characterisation: we show that realisation of accents is gradual over the trailed accentual group and that this gradual lengthening is needed for perception. A model of repartition of the IPCG duration among its segmental constituents incorporating automatic generation of pauses (emergence and duration) according to speech rate is then described.

[1]  Katarina Bartkova,et al.  A model of segmental duration for speech synthesis in French , 1987, Speech Commun..

[2]  Gérard Bailly,et al.  Integration of rhythmic and syntactic constraints in a model of generation of French prosody , 1989, Speech Commun..

[3]  K. Pike,et al.  The intonation of American English , 1946 .

[4]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[5]  Michael I. Jordan Motor Learning and the Degrees of Freedom Problem , 2018, Attention and Performance XIII.

[6]  Paul Touati,et al.  Structures prosodiques du suédois et du français : profils temporels et configurations tonales , 1987 .

[7]  D. Duez,et al.  Contribution à l'étude de la structuration temporelle de la parole en français , 1987 .

[8]  Stephen Isard,et al.  Segment durations in a syllable frame , 1991 .

[9]  Gérard Bailly,et al.  Automatic labeling of large prosodic databases : tools, methodology and links with a text-to-speech system , 1990, SSW.

[10]  Gérard Bailly,et al.  Automatic labelling of large prosodic databases: tools, methodology and links with a text-to-speech system , 1994 .

[11]  P Howell,et al.  Prediction of P-center location from the distribution of energy in the amplitude envelope: I , 1988, Perception & psychophysics.

[12]  A. Classe The rhythm of English prose , 1939 .

[13]  P. Viviani,et al.  Motor-perceptual interactions , 1992 .

[14]  J. Hart,et al.  Intonation by rule: a perceptual quest , 1973 .

[15]  Lennart Nord,et al.  Durational correlates of stress in Swedish, French and English* , 1991 .

[16]  Andrej Ljolje,et al.  Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models , 1986, IEEE Trans. Acoust. Speech Signal Process..

[17]  B. Wenk,et al.  Is French really syllable-timed? , 1982 .

[18]  D. O'Shaughnessy,et al.  A multispeaker analysis of durations in read French paragraphs. , 1984, The Journal of the Acoustical Society of America.

[19]  Alexander G. Hauptmann,et al.  SPEAKEZ: a first experiment in concatenation synthesis from a large corpus , 1993, EUROSPEECH.

[20]  J. V. Santen,et al.  The analysis of contextual effects on segmental duration , 1990 .

[21]  George D. Allen,et al.  Speech Rhythm: Its Relation to Performance Universals and Articulatory Timing. , 1975 .

[22]  F. Grosjean,et al.  Les structures de performance en français: caractérisation et prédiction , 1993 .

[23]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[24]  Wayne A Lea Prosodic Aids to Speech Recognition. 4. A General Strategy for Prosodically-Guided Speech Understanding , 1974 .

[25]  Yao Shen,et al.  Isochronism in english , 1962 .

[26]  M. T. Turvey,et al.  ‘Clock’ and ‘motor’ components in absolute coordination of rhythmic movements , 1989, Neuroscience.

[27]  G. Fant,et al.  Speech , Music and Hearing Quarterly Progress and Status Report Preliminaries to the study of Swedish prose reading and reading style , 2007 .

[28]  S. M. Marcus Acoustic determinants of perceptual center (P-center) location , 1981, Perception & psychophysics.

[29]  Bernd Pompino-Marschall,et al.  On the Psychoacoustic Nature of the P-Center Phenomenon , 1989 .

[30]  D. O'Shaughnessy A multispeaker analysis of durations in read French paragraphs , 1984 .

[31]  V. Aubergé,et al.  Developing a structured lexicon for synthesis of prosody , 1994 .

[32]  John N. Gowdy,et al.  Neural network based generation of fundamental frequency contours , 1989, International Conference on Acoustics, Speech, and Signal Processing,.