Prominence detected by listeners for future speech synthesis application

The point of interest in the present investigation is to find out and to make a pilot statistical presentation of the prominence distinguished by native speakers in read aloud texts taken from the Russian corpus for text-to-speech unit-selection synthesis. The TTS system uses the linguistic information encoded in the input text. Therefore the parameters which are easily extracted from the text (part of speech classes, number of syllables) are admitted as the basis for the classification of the words detected as prominent by listeners. On further steps the TTS system has to assign prosodic structure and its suprasegmental acoustic parameters. The professionally made phonetic segmentation and analysis of syntagmatic structures of the material are compared with the judgments of native speakers in order to find some of these acoustic correlates.

[1]  Louis ten Bosch,et al.  Up to what level can acoustical and textual features predict prominence , 2001, INTERSPEECH.

[2]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[3]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[4]  Maria Wolters,et al.  Prediction of word prominence , 1997, EUROSPEECH.

[5]  Antoine Raux,et al.  A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[6]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .