论文信息 - A general-purpose IsiZulu speech synthesizer

A general-purpose IsiZulu speech synthesizer

A general-purpose isiZulu text-to-speech (TTS) system was developed, based on the ‘Multisyn’ unit-selection approach supported by the Festival TTS toolkit. The development involved a number of challenges related to the interface between speech technology and linguistics—for example, choosing an appropriate set of phonetic units, producing reliable pronunciations, and developing appropriate cost fonctions for selecting and joining diphone units. We show how solutions were found for each of these challenges, and describe a number of other innovations (such as automated fault detection in manual alignments) that were introduced. Initial evaluations suggest that the synthesizer is usable by a wide spectrum of isiZulu speakers.

[1] J. C. Roux,et al. Xhosa: A tone or pitch–accent language? , 1998 .

[2] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] Etienne Barnard,et al. Automatic error detection in alignments for speech synthesis , 2006 .

[4] Etienne Barnard,et al. The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping , 2004, INTERSPEECH.

[5] Simon King,et al. Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[6] Paul Taylor,et al. The architecture of the Festival speech synthesis system , 1998, SSW.

[7] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[8] Etienne Barnard,et al. Statistical investigations into isiZulu intonation , 2005 .