论文信息 - Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Slovenian Text-to-Speech Synthesis for Speech User Interfaces

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU. Keywords—text-to-speech synthesis, prosody modeling, speech user interface.

Jerneja Zganec-Gros | Ales Mihelic | Nikola Pavesic | Mario Zganec | Stanislav Gruden

[1] Mehryar Mohri,et al. Rapid unit selection from a large speech corpus for concatenative speech synthesis , 1999, EUROSPEECH.

[2] Richard Sproat,et al. The bell labs German text-to-speech system: an overview , 1997, EUROSPEECH.

[3] Simon Dobrisek,et al. Spoken Language Resources at LUKS of the University of Ljubljana , 2003, Int. J. Speech Technol..

[4] Matjaz Gams,et al. SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System , 2003, Int. J. Speech Technol..

[5] Alistair Conkie. A robust unit selection system for speech synthesis , 1999 .

[6] Alan W. Black,et al. Arabic in my hand: small-footprint synthesis of egyptian arabic , 2003, INTERSPEECH.

[7] Thierry Dutoit,et al. High-quality speech synthesis for phonetic speech segmentation , 1997, EUROSPEECH.

[8] Thierry Dutoit,et al. Text design for TTS speech corpus building using a modified greedy selection , 2003, INTERSPEECH.

[9] Hideyuki Mizuno,et al. Recording script design for corpus-based TTS system based on coverage of various phonetic elements , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10] France Mihelic,et al. Syllable and segment duration at different speaking rates in the Slovenian language , 1997, EUROSPEECH.

[11] Simon Dobrisek,et al. Homer II - man-machine interface to internet for blind and visually impaired people , 2003, Comput. Commun..

[12] F. Park. ROBUST UNIT SELECTION SYSTEM FOR SPEECH SYNTHESIS , 1999 .

[13] Joram Meron,et al. Compression of exception lexicons for small footprint grapheme-to-phoneme conversion , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14] France Mihelic,et al. Evaluation of the Slovenian HMM-Based Speech Synthesis System , 2004, TSD.

[15] France Mihelic,et al. Slovene Interactive Text-to-Speech Evaluation Site - SITES , 1999, TSD.

[16] Chih-Chung Kuo,et al. Efficient and scalable methods for text script generation in corpus-based TTS design , 2002, INTERSPEECH.

[17] France Mihelic,et al. Speech timing in Slovenian TTS , 1997, EUROSPEECH.

[18] Jan P. H. van Santen,et al. Methods for optimal text selection , 1997, EUROSPEECH.

[19] Georgios Kouroupetroglou,et al. An intonation model for embedded devices based on natural F0 samples , 2004, INTERSPEECH.

[20] Alan W. Black,et al. Flite: a small fast run-time synthesis engine , 2001, SSW.

[21] Marc C. Beutnagel,et al. The AT & T NEXT-GEN TTS system , 1999 .

[22] Jani Nurminen,et al. Optimal subset selection from text databases , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..