Implementing Thai text-to-speech synthesis for hand-held devices

The quality of Thai text-to-speech (TTS) have greatly improved over the past decade. To expand its utilization over platforms with different computational capacities, this paper described a real implementation of Thai TTS system for hand-held devices. The system consisted of two parts: the text analysis part and the speech synthesis part. To overcome an issue on the limited resources of the devices, a rule-based technique was chosen for the text analysis while the speech synthesis part was deployed in a model-based approach, specifically hidden Markov models, rather to the concatenative one. Parameters of the synthesis part were experimented with to reduce the computational time. The execution time and subjective mean opinion scores (MOS) have been measured. Experimental results show that computational requirements can be reduced by shorten the length of synthesis filter impulse response and the dimension of the feature vectors to some degrees without sacrificing the synthetic speech quality.

[1]  Pradit Mittrapiyanuruk,et al.  THE AUTOMATIC THAI SENTENCE EXTRACTION , 2000 .

[2]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[3]  Virach Sornlertlamvanich,et al.  Issues in Thai Text-to-Speech Synthesis: The NECTEC Approach 1 , 2000 .

[4]  Hitoshi Isahara,et al.  Building a Thai part-of-speech tagged corpus (ORCHID) , 1999 .

[5]  Takao Kobayashi,et al.  Implementation and evaluation of an HMM-based Thai speech synthesis system , 2007, INTERSPEECH.

[6]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Alan W. Black,et al.  Statistically trained orthographic to sound models for Thai , 2000, INTERSPEECH.

[8]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[9]  Jong-Jin Kim,et al.  HMM-based Korean speech synthesis system for hand-held devices , 2006, IEEE Transactions on Consumer Electronics.

[10]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[11]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Wanchai Rivepiboon,et al.  A Unified Model of Thai Romanization and Word Segmentation , 2004, PACLIC.

[13]  A. Mukhopadhyay,et al.  Shruti: an embedded text-to-speech system for Indian languages , 2006, IEE Proc. Softw..

[14]  Hitoshi Isahara,et al.  Thai Part-of-speech Tagged Corpus: ORCHID , 1998 .

[15]  Virach Sornlertlamvanich,et al.  Thai Tagged Speech Corpus for Speech Synthesis , 2003 .

[16]  Walter Daelemans,et al.  Data-Oriented Methods for Grapheme-to-Phoneme Conversion , 1993, EACL.