Prediction of tone naturalness perception using geometric model

Naturalness is an important issue in the Text-To-Speech (TTS) system. To support arbitrarily defined pitch contours for any synthesized syllables, a TTS should be able to maintain the naturalness of the synthetic speech. This work proposed an automatic evaluation of pitch contours in order to determine the level of naturalness of synthesized syllables when perceived by human listeners. By analyzing results, tone perception experiments conducted on human listeners in this work, a syllable tone naturalness prediction model based on the midpoint and endpoint of the syllable's rhyme part was proposed. The model was then used for developing a tone naturalness prediction algorithm using geometric models of pitch contours. The evaluation of the tone naturalness prediction algorithm involved human listeners perceiving the naturalness of syllables with 45 pitch contour patterns, each of which with 2 repetitions. The proposed algorithm achieved approximately 80% consistency rate compared against human listeners' decisions on tone naturalness of the syllables.

[1]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2]  Hansjörg Mixdorff,et al.  Perception of tone and vowel quantity in Thai , 2002, INTERSPEECH.

[3]  Agatha H. Bowley The psychometric method. , 1948 .

[4]  Takao Kobayashi,et al.  Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis , 2009, Speech Commun..

[5]  Rattima Nitisaroj,et al.  Tone Features, Tone Perception, and Peak Alignment in Thai , 2007, Language and speech.

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  R. Stroh,et al.  Subjective evaluation of differential pulse-code modulation using the speech "Goodness" rating scale , 1973 .

[8]  Tan Lee,et al.  F0 Analysis and Modeling for Cantonese Text-to-Speech , 2004 .

[9]  Virach Sornlertlamvanich,et al.  Improving naturalness of Thai text-to-speech synthesis by prosodic rule , 2000, INTERSPEECH.

[10]  Boonserm Kijsirikul,et al.  TONE RECOGNITION OF CONTINUOUS THAI SPEECH UNDER TONAL ASSIMILATION AND DECLINATION EFFECTS USING HALF-TONE MODEL , 2001 .

[11]  Boonserm Kijsirikul,et al.  Tone Recognition of Continuous Thai Speech Under Tonal Assimilation and Declination Effects Using Half-Tone Model , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Elizabeth C. Zsiga,et al.  The Lexical and Post-Lexical Phonology of Thai Tones* , 2006 .

[13]  Keikichi Hirose,et al.  Use of generation process model for synthesizing fundamental frequency contours in HMM-based speech synthesis , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[14]  M. Ostendorf,et al.  ON THE RELATIVE IMPORTANCE OF DIFFERENT PROSODIC FACTORS FOR IMPROVING SPEECH SYNTHESIS , 1999 .