DURATION ANALYSIS AND MODELLING FOR TURKISH TEXT-TO-SPEECH SYNTHESIS

DURATION ANALYSIS AND MODELLING FOR TURKISH TEXT-TO-SPEECH SYNTHESIS Naturalness in TTS systems plays a big role in the acceptability of the TTS synthesis outputs. Rhythm, intonation, stress pattern, pitch and duration (timing) are the most important parameters which effect naturalness of the TTS system output. The task of the timing component in a TTS system is to compute duration information for sub-elements which are to be used in synthesis output. Duration modelling is a very challenging part of a TTS system since very little is known about the underlying process responsible for speech timing of humans. To analyze and model duration for Turkish TTS systems, spoken utterances of 1-words and sentences of an adult male are used which are recorded at high digital quality. Firstly, coverage of the Turkish by this spoken text corpus is investigated, which is found to be well enough. Afterwards, analysis of the durations of Turkish phonemes is done. Effects of factors that can be computed from text on the durations are found to determine which of them should be included in the duration models. To model duration, four models have been implemented. First two models use mean durations of the phonemes and mean durations of the triphones. Third model uses mean durations of the nodes of trees for triphones for duration prediction. The last model is an additive model where the effects of factors are found by regression analysis.

[1]  Chilin Shih,et al.  Duration Study for the Bell Laboratories Mandarin Text-to-Speech System , 1997 .

[2]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[3]  T. Crystal,et al.  Segmental durations in connected-speech signals: Syllabic stress , 1988 .

[4]  Diane K. Michelson,et al.  Applied Statistics for Engineers and Scientists , 2001, Technometrics.

[5]  Ömer Demircan Türkiye Türkçesinin ses düzeni : Türkiye Türkçesinde sesler , 1979 .

[6]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[7]  T. Crystal,et al.  Segmental durations in connected‐speech signals: Current results , 1988 .

[8]  N. Umeda Vowel duration in American English. , 1975, The Journal of the Acoustical Society of America.

[9]  R. Port Linguistic timing factors in combination. , 1981, The Journal of the Acoustical Society of America.

[10]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[11]  A. House On Vowel Duration in English , 1961 .

[12]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[13]  Jerome R. Bellegarda,et al.  Statistical prosodic modeling: from corpus design to parameter estimation , 2001, IEEE Trans. Speech Audio Process..

[14]  J. V. Santen Exploring N -way tables with sums-of-products models , 1993 .

[15]  Nevin Selen Söyleyiş sesbilimi, akustik sesbilim ve Türkiye Türkçesi , 1979 .

[16]  Ümit Yapanel GARBAGE MODELING TECHNIQUES FOR A TURKISH KEYWORD SPOTTING SYSTEM , 2000 .

[17]  Michael Riley Tree-based modelling for speech synthesis , 1990, SSW.

[18]  N. Umeda,et al.  Letter: Effect of speaking mode on temporal factors in speech: vowel duration. , 1974, The Journal of the Acoustical Society of America.

[19]  D. O'Shaughnessy,et al.  A multispeaker analysis of durations in read French paragraphs. , 1984, The Journal of the Acoustical Society of America.

[20]  Jan P. H. van Santen,et al.  Deriving text-to-speech durations from natural speech , 1990, SSW.

[21]  Jan P. H. van Santen,et al.  Contextual effects on vowel duration , 1992, Speech Commun..

[22]  D. Klatt Letter: Interaction between two factors that influence vowel duration. , 1973, The Journal of the Acoustical Society of America.

[23]  N. Umeda Consonant duration in American English , 1977 .

[24]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[25]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[26]  Goopeel Chung Hierarchical Duration Modelling for a Speech Recognition System , 1997 .

[27]  Patricia S. O Sullivan,et al.  100 Statistical Tests , 1995 .

[28]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[29]  James R. Glass,et al.  Natural-sounding speech synthesis using variable-length units , 1998, ICSLP.