Thai speech processing technology: A review

This paper presents a comprehensive review of Thai speech technology, from its impetus in the early 1960s to 2005. Thai is the official language of Thailand, and is spoken by over 60 million people worldwide. As with Chinese, it is a tonal language. It has a spelling system using a Thai alphabet, but has no explicit word boundaries, similar to several Asian languages, such as Japanese and Chinese. It does have explicit marks for tones, as in the languages of the neighboring countries, Laos and Vietnam. Therefore, with these unique characteristics, research and development of language and speech processing specifically for Thai is necessary and quite challenging. This paper reviews the progress of Thai speech technology in five areas of research: fundamental analyses and tools, text-to-speech synthesis (TTS), automatic speech recognition (ASR), speech applications, and language resources. At the end of the paper, the progress and focus of Thai speech research, as measured by the number of publications in each research area, is reviewed and possible directions for future research are suggested.

[1]  Asanee Kawtrakul,et al.  Statistical and Heuristic Rule Based Model for Thai Named Entity Recognition , 2002 .

[2]  Virach Sornlertlamvanich,et al.  Thai Speech Corpus for Speech Recognition , 2003 .

[3]  Mary P. Harper,et al.  Speaker-independent automatic classification of Thai tones in connected speech by analysis-synthesis method , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Supphanat Kanokphara Syllable structure based phonetic units for context-dependent continuous Thai speech recognition , 2003, INTERSPEECH.

[5]  Joseph R. Cooke,et al.  The Vowels and Tones of Standard Thai: Acoustical Measurements and Experiments. Arthur S. Abramson , 1963 .

[6]  Yoshinori Sagisaka,et al.  ATR μ-talk speech synthesis system , 1992, ICSLP.

[7]  Somchai Jitapunkul,et al.  Direct classification of Thai monophthongs on two‐dimensional acoustic–phonetic feature spaces in linear, mel, bark, and bark‐difference frequency scales , 2000 .

[8]  Boonserm Kijsirikul,et al.  A Method for Isolated Thai Tone Recognition Using a Combination of Neural Networks , 2002, Comput. Intell..

[9]  Rachod Thongprasirt,et al.  Pronunciation variation speech recognition without dictionary modification on sparse database , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Mary P. Harper,et al.  Classification of Thai tone sequences in syllable-segmented speech using the analysis-by-synthesis method , 1999, IEEE Trans. Speech Audio Process..

[11]  A. Tungthangthum Tone recognition for Thai , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[12]  Boonserm Kijsirikul,et al.  Tone recognition in Thai continuous speech based on coarticulaion, intonation and stress effects , 2002, INTERSPEECH.

[13]  Virach Sornlertlamvanich,et al.  Issues in Thai Text-to-Speech Synthesis: The NECTEC Approach 1 , 2000 .

[14]  Satoshi Hoshino,et al.  Thai Morphological Analyses Based on the Syllable Formation Rules , 1992 .

[15]  M P Harper,et al.  Acoustic Correlates of Stress in Thai , 1996, Phonetica.

[16]  Eva Navas,et al.  Assigning Phrase Breaks Using CARTs for Basque TTS , 2002 .

[17]  Petr Sojka,et al.  Context Sensitive Pattern Based Segmentation: A Thai Challenge , 2003 .

[18]  Virach Sornlertlamvanich,et al.  Thai grapheme-to-phoneme using probabilistic GLR parser , 2001, INTERSPEECH.

[19]  Chai Wutiwiwatchai,et al.  Improvement of speaker verification for Thai language , 2001, INTERSPEECH.

[20]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[21]  Wanchai Rivepiboon,et al.  A Unified Model of Thai Romanization and Word Segmentation , 2004, PACLIC.

[22]  Siripong Potisuk,et al.  Inter- and intraspeaker variability in fundamental frequency of Thai tones , 1991, Speech Commun..

[23]  Somchai Jitapunkul,et al.  Thai monophthongs classification using CDHMM , 2000 .

[24]  Mary R. Haas THE THAI SYSTEM OF WRITING , 1943 .

[25]  Nattakan Pengphon,et al.  Word Formation Approach to Noun Phrase Analysis for Thai , 2002 .

[26]  Wirote Aroonmanakun,et al.  Collocation and Thai Word Segmentation , 2002 .

[27]  Chai Wutiwiwatchai,et al.  A new strategy of fuzzy-neural network for Thai numeral speech recognition , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[28]  Tanee Demeechai,et al.  Recognition of syllables in a tone language , 2001, Speech Commun..

[29]  Sudaporn Luksaneeyanawin,et al.  Intonation in Thai. , 1983 .

[30]  Thanaruk Theeramunkong,et al.  Speed Compensation for Improving Thai Spelling Recognition with a Continuous Speech Corpus , 2004, INTELLCOMM.

[31]  Somchai Jitapunkul,et al.  A speaker-independent Thai polysyllabic word recognition using hidden Markov model , 1997, 1997 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM. 10 Years Networking the Pacific Rim, 1987-1997.

[32]  Denis Burnham,et al.  Auditory-visual perception of syllabic tones in Thai , 2005, AVSP.

[33]  Sadaoki Furui,et al.  A multi-stage approach for Thai spoken language understanding , 2006, Speech Commun..

[34]  Sadaoki Furui,et al.  Pioneering a Thai Language Spoken Dialogue System , 2003 .

[35]  Somchai Jitapunkul,et al.  Recognition of intonation patterns in Thai utterance , 2003, INTERSPEECH.

[36]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[37]  Satoshi Imai,et al.  A demisyllable approach to speech synthesis of Thai , 1983 .

[38]  Rachada Kongkachandra,et al.  Thai intonation analysis in harmonic-frequency domain , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[39]  Virach Sornlertlamvanich,et al.  Speech Technology and Corpus Development in Thailand , 2001 .

[40]  Somchai Jitapunkul,et al.  Recent advances of Thai speech recognition in Thailand , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[41]  Pradit Mittrapiyanuruk,et al.  THE AUTOMATIC THAI SENTENCE EXTRACTION , 2000 .

[42]  Chai Wutiwiwatchai,et al.  Thai polysyllabic word recognition using fuzzy-neural network , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[43]  Supphanat Kanokphara,et al.  Automatic Question Generation for HMM State Tying using a Feature Table , 2004 .

[44]  Hiroya Fujisaki,et al.  Information, prosody, and modeling - with emphasis on tonal features of speech - , 2004, Speech Prosody 2004.

[45]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[46]  Tanja Schultz,et al.  Fast bootstrapping of LVCSR systems with multilingual phoneme sets , 1997, EUROSPEECH.

[47]  Alan W. Black,et al.  Statistically trained orthographic to sound models for Thai , 2000, INTERSPEECH.

[48]  Nick Cercone,et al.  Spelling Recognition for Two Commonly-Used Thai Spelling Methods , 2005, Artificial Intelligence and Applications.

[49]  Tanja Schultz GLOBALPHONE: A MULTILINGUAL SPEE DEVELOPED AT KARLSRUHE , 2002 .

[50]  Asanee Kawtrakul A computational Model for Writing Production Assistant System , 1995 .

[51]  Boonserm Kijsirikul,et al.  Support Vector Machines for Thai Phoneme Recognition , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[52]  Montri Karnjanadecha,et al.  HMM BASED SPEECH RECOGNITION OF CONTINUOUS THAI DIGITS , 2003 .

[53]  Hansjörg Mixdorff,et al.  Perception of tone and vowel quantity in Thai , 2002, INTERSPEECH.

[54]  Cornelius Beach Bradley Graphic Analysis of the Tone-Accents of the Siamese Language , 1911 .

[55]  Siripong Potisuk,et al.  Tonal Coarticulation in Thai , 1994 .

[56]  Montri Karnjanadecha,et al.  A comparison of front-end analyses for Thai speech recognition , 2002, INTERSPEECH.

[57]  Zhao Li,et al.  Evaluation of microphone arrays for enhancing noisy and reverberant speech for coding , 1999, IEEE Trans. Speech Audio Process..

[58]  Sadaoki Furui,et al.  Belief-based nonlinear rescoring in Thai speech understanding , 2004, INTERSPEECH.

[59]  Asanee Kawtrakul,et al.  Thai Connected Digit Speech Recognition Using Hidden Markov Models , 2004 .

[60]  Montri Karnjanadecha,et al.  Synthesis of vowels and tones in Thai language by articulatory modeling , 2004, INTERSPEECH.

[61]  Boonserm Kijsirikul,et al.  Feature-based Thai unknown word boundary identification using Winnow , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[62]  Surapant Meknavin,et al.  Feature-based Thai Word Segmentation , 1997 .

[63]  Nipon Theera-Umpon,et al.  Thai Phoneme Segmentation using Dual-Band Energy Contour , 2002 .

[64]  Hitoshi Kiya Design of integer wavelet filters for image compression , 1999 .

[65]  Thanaruk Theeramunkong,et al.  Thai spelling recognition using a continuous speech corpus , 2004, COLING 2004.

[66]  Arthur S. Abramson,et al.  Illustrations of the IPA: Thai , 1993 .

[67]  Chai Wutiwiwatchai,et al.  Text-dependent speaker identification using neural network on distinctive Thai tone marks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[68]  Sudaporn Luksaneeyanawin,et al.  Automatic Thai Transcription of English Words , 2004 .

[69]  Virach Sornlertlamvanich,et al.  Learning phrase break detection in Thai text-to-speech , 2003, INTERSPEECH.

[70]  Nakarin Satthamnuwong,et al.  Effects of Speaking Rate on Thai Tones , 1999, Phonetica.

[71]  Hansjörg Mixdorff,et al.  Visual Cues in Thai Tone Recognition , 2004 .

[72]  Tanja Schultz,et al.  A Thai Speech Translation System for Medical Dialogs , 2004, NAACL.

[73]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[74]  Virach Sornlertlamvanich,et al.  Automatic Sentence Break Disambiguation for Thai , 2001 .

[75]  Virach Sornlertlamvanich,et al.  Improving naturalness of Thai text-to-speech synthesis by prosodic rule , 2000, INTERSPEECH.

[76]  Virach Sornlertlamvanich,et al.  A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis , 2003, NAACL.

[77]  Asanee Kawtrakul,et al.  A State of the Art of Thai Language Resources and Thai Language Behavior Analysis and Modeling , 2002, ALR@COLING.

[78]  Virach Sornlertlamvanich,et al.  Overview of Recent Activities in East Asia: Speech Corpora and Assessment , 2001 .

[79]  Adrian S. Palmer THAI TONE VARIANTS AND THE LANGUAGE TEACHERS , 1969 .

[80]  Sadaoki Furui,et al.  Confidence scoring for ANN-based spoken language understanding , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[81]  Chai Wutiwiwatchai,et al.  Learning methods and features for corpus-based phrase break prediction on Thai , 2005, INTERSPEECH.

[82]  Chai Wutiwiwatchai,et al.  Text-dependent speaker identification using LPC and DTW for Thai language , 1999, Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99. 'Multimedia Technology for Asia-Pacific Information Infrastructure' (Cat. No.99CH37030).

[83]  Virach Sornlertlamvanich,et al.  Thai Tagged Speech Corpus for Speech Synthesis , 2003 .

[84]  Boonserm Kijsirikul,et al.  Tone Recognition of Continuous Thai Speech Under Tonal Assimilation and Declination Effects Using Half-Tone Model , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[85]  Sadaoki Furui,et al.  Hybrid Statistical and Structural Semantic Modeling for Thai Multi-Stage Spoken Language Understanding , 2004, HLT-NAACL 2004.

[86]  T. Luangthongkum,et al.  Rhythm in standard Thai , 1978 .

[87]  James Higbie,et al.  Thai reference grammar : the structure of spoken Thai , 2002 .

[88]  Supphanat Kanokphara,et al.  A study of HMM-based automatic segmentations for Thai continuous speech recognition system , 2002 .

[89]  T. Charoenporn Building a large Thai text corpus - part of speech tagged corpus: ORCHID , 1997 .

[90]  Supphanat Kanokphara,et al.  Phonetically Distributed Continuous Speech Corpus for Thai Language , 2002, LREC.

[91]  Denis Burnham,et al.  Perception of lexical tone across languages: evidence for a linguistic mode of processing , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[92]  Thanaruk Theeramunkong,et al.  Non-Dictionary-Based Thai Word Segmentation Using Decision Trees , 2001, HLT.

[93]  Yoshinori Sagisaka,et al.  Analysis and modeling of syllable duration for Thai speech synthesis , 2003, INTERSPEECH.

[94]  Tanja Schultz,et al.  Thai automatic speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..