Automatic generation of prosody: comparing two superpositional systems

Abstract We face many options when designing a system thatautomatically generates prosody from linguistic andparalinguistic information. The literature provides severalcandidate phonetic models, phonological models and mappingtools to actually implement the system. We detail here somedimensions along which these models have to be compared. Weshow also that systems employing quite similar phonetic modelscan still have radically different approaches. We present resultsof a first evaluation comparing two systems using asuperpositional model of melody on a common multilingualprosodic database of spoken math formulae. We conclude thatprosodic models and intonation theories could certainly benefitfrom well-defined tasks and fair benchmarks. 1. Introduction It is a commonly accepted view that prosody crucially shapes thespeech signal in order to ease the decoding of linguistic andparalinguistic information by the listener.The present study compares two automatic prosodygeneration systems applied to a bilingual corpus. Our aim is notto designate a “winner” but to promote inter-system comparisonon common data as valuable aid to shed light on strengths andweaknesses of either system. The systems used here have to facedifferent challenges: the SFC is applied to German (its targetlanguage being originally French); the IGM has to deal both witha new type of document (spoken mathematical formulae) and anew language (French). The reader should have in mind that thechallenge for the latter demands more invasive adaptation of thesystem and results are to be considered as preliminary.After introducing the framework in which we apprehendprosody generation systems, the two systems are brieflydescribed (§3). Objective and subjective evaluation (§4) willenable us to detail some properties of the respective systems(§5).

[1]  Jan P. H. van Santen,et al.  Quantitative Modeling of Pitch Accent Alignment , 2002 .

[2]  Gérard Bailly,et al.  Learning the Hidden Structure of Intonation: Implementing Various Functions of Prosody , 2002 .

[3]  Gérard Bailly,et al.  Generating prosody by superposing multi-parametric overlapping contours , 2000, INTERSPEECH.

[4]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[5]  Gérard Bailly,et al.  Characterisation of rhythmic patterns for text-to-speech synthesis , 1994, Speech Communication.

[6]  Christof Traber F0 generation with a data base of natural F0 patterns and with a neural network , 1990, SSW.

[7]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[8]  V. Aubergé,et al.  Developing a structured lexicon for synthesis of prosody , 1994 .

[9]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[10]  Gérard Bailly,et al.  Integration of rhythmic and syntactic constraints in a model of generation of French prosody , 1989, Speech Commun..

[11]  Harlan Lane,et al.  The patterns of silence: Performance structures in sentence production , 1979, Cognitive Psychology.

[12]  Matthew P. Aylett,et al.  Intonation: Theory, Models and Applications , 1997 .

[13]  Thierry Dutoit,et al.  Fully automatic prosody generator for text-to-speech , 1998, ICSLP.

[14]  Bernd Möbius,et al.  Modeling Pitch Accent Curves , 1997 .

[15]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .

[16]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[17]  Gérard Bailly,et al.  Generation of intonation: a global approach , 1995, EUROSPEECH.

[18]  Paul Taylor,et al.  Speech synthesis by phonological structure matching , 1999, EUROSPEECH.

[19]  Hisashi Kawai,et al.  Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  Gérard Bailly,et al.  Generating prosodic attitudes in French: Data, model and evaluation , 2001, Speech Commun..

[21]  Daniel Hirst,et al.  Levels of Representation and Levels of Analysis for the Description of Intonation Systems , 2000 .

[22]  Nick Campbell,et al.  Automatic detection of prosodic boundaries in speech , 1993, Speech Commun..

[23]  G. Bailly,et al.  Performance Structures of Mathematical Formulae , 1999 .

[24]  Hansjörg Mixdorff,et al.  Building an integrated prosodic model of German , 2001, INTERSPEECH.

[25]  R. Prudon,et al.  Prosody synthesis by unit selection and transplantation on diphones , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[26]  N. Thorsen,et al.  STANDARD DANISH SENTENCE INTONATION — PHONETIC DATA AND THEIR REPRESENTATION , 1983 .