Northern Thai Dialect Text to Speech

Each of the dialects of Thai Language has a distinct identity associated with its accents. The conversation between different native speakers of these dialects despite their standard language origination cannot be avoided when visiting each region. Communication with people who understand only the Northern Thai Dialect (NTD) brought us to the idea of inventing the Northern Thai Dialect Text to Speech (NTD-TTS). This idea derives from the same concept as a translating program; after getting text input in the Center Thai Dialect (CTD), the TTS system will translate and synthesize speech output in NTD. TTS used a software structure and modified two components: Grapheme to Phoneme (G2P) and Speech models. The NTD-G2P conversion was created by using rule-based and dictionary-based approaches. It was evaluated by 100 randomly selected sentences from ORCHID. The NTD-G2P reports a conversion accuracy of 83.19% on the syllable level and it is used for implementing the NTD-corpus. The sentence selections were presented to train the NTD speech model. The selection chosen covers 95.32% in the first percentile of phoneme distribution in the NTD-corpus. After connecting the speech models to the TTS system, the whole system was evaluated with Mean Opinion Score (MOS) and the comprehension on the syllable level by the native speakers. The NTD-MOS evaluations indicated that the accent, naturalness, and intelligibility of synthetic speech ranged from “acceptable” to “good”. The test set of the NTD-TTS system earned a good MOS and high comprehension percentage from the NTD native listeners. The results are 3.73 in the accent, 3.68 in the naturalness, 3.63 in the intelligibility, and the comprehension percentage is 97.16%.

[1]  Sadaoki Furui,et al.  Thai speech processing technology: A review , 2007, Speech Commun..

[2]  Virach Sornlertlamvanich,et al.  Thai grapheme-to-phoneme using probabilistic GLR parser , 2001, INTERSPEECH.

[3]  Virach Sornlertlamvanich,et al.  Thai Tagged Speech Corpus for Speech Synthesis , 2003 .

[4]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[6]  W. Kreesuradej,et al.  State of the Art Review on Thai Text-to-Speech System , 2008, 2008 International Conference on Computer Science and Information Technology.

[7]  Chai Wutiwiwatchai,et al.  Thai phonetization of English words using English syllables , 2013, 2013 10th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[8]  Karima Meftouh,et al.  Grapheme to phoneme conversion: an Arabic dialect case , 2014, SLTU.

[9]  Natthawut Kertkeidkachorn,et al.  CHULA TTS: A Modularized Text-To-Speech Framework , 2014, PACLIC.

[10]  Oliver Watts,et al.  Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis , 2013, SSW.