论文信息 - User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service

User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service

Today’s automated telephone services generally use recorded speech from one speaker for all output. In applications with large and varying output vocabularies, such as place names, it may be necessary to employ a second speaker to provide new vocabulary items if the original speaker is not available, or to use text-tospeech (TTS) synthesis for the whole or parts of the output. This paper reports a comparison of 10 schemes for the generation of spoken output in a travel information service, ranging from natural speech from a single speaker, through combinations of different voices and of natural and synthetic speech, to TTS synthesis throughout. The results show strong preferences for concatenated speech over TTS and for professionalquality recordings over amateur ones, and a weaker preference for a single speaker over two speakers.

Mike Edgington | Mervyn A. Jack | Fergus R. McInnes | David Attwater | Mark S. Schmidt

[1] J. H. Page,et al. The Laureate text-to-speech system : architecture and applications , 1996 .