User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service
暂无分享,去创建一个
Today’s automated telephone services generally use recorded speech from one speaker for all output. In applications with large and varying output vocabularies, such as place names, it may be necessary to employ a second speaker to provide new vocabulary items if the original speaker is not available, or to use text-tospeech (TTS) synthesis for the whole or parts of the output. This paper reports a comparison of 10 schemes for the generation of spoken output in a travel information service, ranging from natural speech from a single speaker, through combinations of different voices and of natural and synthetic speech, to TTS synthesis throughout. The results show strong preferences for concatenated speech over TTS and for professionalquality recordings over amateur ones, and a weaker preference for a single speaker over two speakers.
[1] J. H. Page,et al. The Laureate text-to-speech system : architecture and applications , 1996 .