This study attempts to determine whether natural prosody variations and different methods of applying prosodic patterns are relevant to listeners' perceptions of synthetic speech quality. The prosodic patterns of five test sentences including Yes-N o-questions, Wh-questions, declaratives, and continuation rises as produced by six female native speakers of four varieties of English were imposed on the same US English voice using four different methods. Results of a perceptual experiment involving 32 listeners show that the methods resulting in fewer distortions and artifacts are preferred to a significant degree, thus favoring synthesis approaches with minimal signal modification and prosodic patterns without extreme parameter values. An additional test that includes a more obvious prosodic phrasing error further clarifies that prosody becomes a more significant factor when no meaningful interpretation is evident in the given context.
[1]
Sharad Singhal,et al.
Intelligibility as a function of speech coding method for template-based speech synthesis
,
1993,
EUROSPEECH.
[2]
Juin-Huey Chen.
Low-complexity Wideband Speech Coding
,
1995,
Proceedings. IEEE Workshop on Speech Coding for Telecommunications.
[3]
Yannis Stylianou,et al.
Exploration of acoustic correlates in speaker selection for concatenative synthesis
,
1998,
ICSLP.
[4]
Ann K. Syrdal,et al.
Diphone synthesis using unit selection
,
1998,
SSW.