论文信息 - Türkçe Metinden Konusma Sentezlemede Doğallığın Artırılması İçin Öneriler / Recommendations for Increasing the Naturalness in Turkish Text-to-Speech Synthesis

Türkçe Metinden Konusma Sentezlemede Doğallığın Artırılması İçin Öneriler / Recommendations for Increasing the Naturalness in Turkish Text-to-Speech Synthesis

Ozet Metinden konu s ma sentezleme; yazili bir metnin geli s tirilen sistem tarafindan otomatik olarak okunmasidir. Bu cali s mada, difon tabanli, eklemeli bir konu s ma sentezleyici tasarlanmi s ve gercekle s tirilmi s tir. Birle s tirmede PSOLA yontemi kullanilmaktadir. Genellikle konu s ma sentezleyicilerin ezgi modeli yoktur veya eksiktir. Bu durum sentezlenen konu s manin do g alli g ini olumsuz yonde etkiler. Cali s mamizda bu eksikli g in giderilmesi icin yeni bir model onerilmi s tir. Sentezlenen konu s manin do g alli g inin artirilmasi icin, konu s manin ezgisi uzerinde sure ve vurgu temelli kurallar tanimlanmi s tir. Bu kurallar, hazirlanan ara yuzde yapilan pek cok denemenin sonucunda bulunmu s tur. Uygulanan kurallarin sentezlerin do g alli g indaki ba s arisi oznel dinleme testleriyle olculmu s tur. Sonuc olarak, tanimlanan kurallarin geli s tirilen konu s ma sentezleyicide uygulanmasi ile CMOS testi sonucunda 1,86/5,00 puanlik bir arti s elde edilmi s tir. Bu sonuc, ezgi modelimizin ba s arili oldu g unu gostermektedir. Abstract Text to speech synthesis (TTS) is the automatic reading of a text by a system. In this work, a TTS system which concatenates diphones has been designed and implemented. For concatenations, PSOLA method was used. Usually speech synthesizers lack an intonation model. This degrades the naturalness of the synthesized speech. For increasing the naturalness of the synthesized speech, duration and accent based rules were defined in this study for a proper intonation. These rules were determined after an extensive set of experiments performed in the designed testbed. In the end, an improvement of 1.86/5.00 in the CMOS score was obtained by applying the defined rules in the developed synthesis platform. This result shows the success of our intonation model.

H. Gokhan Ilk | Baran Uslu | A. Egemen Yılmaz

[1] Mübeccel Demirekler,et al. On developing new text and audio corpora and speech recognition tools for the turkish language , 2002, INTERSPEECH.

[2] S Omer. DURATION ANALYSIS AND MODELLING FOR TURKISH TEXT-TO-SPEECH SYNTHESIS , 2002 .

[3] Carlos Busso,et al. Investigating the role of phoneme-level modifications in emotional speech resynthesis , 2005, INTERSPEECH.

[4] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[5] Abeer Alwan,et al. Text to Speech Synthesis: New Paradigms and Advances , 2004 .

[6] Kemal Oflazer,et al. An infrastructure for Turkish prosody generation in text-to-speech synthesis , 2006 .

[7] Shrikanth Narayanan,et al. Text To Speech Synthesis , 2006 .

[8] N. Audibert,et al. Emotional Prosody - Does Culture Make A Difference? , 2006 .

[9] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .