Effects on TTS quality of methods of realizing natural prosodic variations

This study attempts to determine whether natural prosody variations and different methods of applying prosodic patterns are relevant to listeners' perceptions of synthetic speech quality. The prosodic patterns of five test sentences including Yes-N o-questions, Wh-questions, declaratives, and continuation rises as produced by six female native speakers of four varieties of English were imposed on the same US English voice using four different methods. Results of a perceptual experiment involving 32 listeners show that the methods resulting in fewer distortions and artifacts are preferred to a significant degree, thus favoring synthesis approaches with minimal signal modification and prosodic patterns without extreme parameter values. An additional test that includes a more obvious prosodic phrasing error further clarifies that prosody becomes a more significant factor when no meaningful interpretation is evident in the given context.