It is commonly agreed that one of the major goals in the development of modem text-to-speech synthesis is the improvement of prosody, especially intonation. Although high quality intonation is an important factor on the way to more natural synthetic speech, it is seldom scrutinized empirically whether and how this affects the relative performance of other components, such as segmental synthesis. The present paper discusses two preliminary rating experiments inquiring into the relation between the naturalness of intonation and subjective segmental quality in Finnish. Experiment 1 showed that the perception of intonation is dependent on the segmental quality. More crucially, experiment 2 indicated that also the perceived segmental acceptability is significantly dependent on the relative naturalness of intonation. In light of the present observations, the goal of improved intonation is not only desirable for the overall quality's sake alone, but it is also shown to improve even the subjective perception of a very basic feature of synthetic speech such as segmental acceptability.
[1]
Martti Vainio,et al.
Artificial Neural Network Based Prosody Models for Finnish Text-to-Speech Synthesis
,
2001
.
[2]
Dafydd Gibbon,et al.
Spoken language system and corpus design
,
1998
.
[3]
Kim E. A. Silverman,et al.
Evaluating synthesiser performance: is segmental intelligibility enough?
,
1990,
ICSLP.
[4]
Juhani Järvikivi,et al.
The manifestation of linguistic information in prosodic features of Finnish
,
2002
.
[5]
Jmb Jacques Terken,et al.
Effects of segmental quality and intonation on quality judgments for texts and utterances
,
1988
.
[6]
A. D. Dominicis,et al.
Intonation Systems: A Survey of Twenty Languages
,
1999
.
[7]
Richard Sproat.
Multilingual Text-to-Speech Synthesis
,
1997
.
[8]
Richard Ogden.
Turn transition, creak and glottal stop in Finnish talk-in-interaction
,
2001,
Journal of the International Phonetic Association.