Implementation and evaluation of a text-to-speech synthesis system for turkish

In this paper, a diphone based Text-to-Speech (TTS) system for the Turkish language is presented. Turkish is the official language of Turkey, where it is the native language of 70 million people and it is also widely spoken in Asia (Azerbaidjain, Uzbekhstan, Kazakhstan, Kirgizhstan and Iran), Cyprus and the Balkans. The research has been done through a visiting internship at CSLR (the Center for Spoken Language Research, University of Colorado at Boulder) as part of an ongoing collaboration between CSLR and METU (Middle East Technical University), Department of Electrical and Electronics Engineering. The system is based on Festival Speech Synthesis System. A diphone database has been designed for Turkish. Tools developed for quick diphone collection and segmentation are illustrated. The text analysis module, the methods used for determination of segment durations and pitch contours are discussed in detail. A Diagnostic Rhyme Test (DRT) has been designed for Turkish to test the intelligibility of the output speech. The resulting TTS system is found to be 86.5% intelligible on the average by 20 listeners. This is the first diphone based Turkish TTS system, whose intelligibility is reported. We also believe that, this paper would help researchers working on building TTS voices, especially those who work on agglutinative languages, since every step needed along the way are explained in detail.